Author Topic: dlopen() broken on C860 with ROM 1.4JP???  (Read 3205 times)

dhns

  • Hero Member
  • *****
  • Posts: 699
    • View Profile
    • http://www.goldelico.com
dlopen() broken on C860 with ROM 1.4JP???
« on: October 26, 2004, 03:32:17 pm »
I am trying to make an application work on the C860 (ROM 1.4JP) which already worked on the SL5500G (ROM 3.10) but it crashes with a "Illegal Instruction" signal.

Now, the strange thing is that it does not crash when being run in the debugger gdb!
So I was only able to trace down the bug by inserting fprintf() instructions.

It turns out to be the dlopen() call which does not return (i.e. fprintf before this call prints, the one after doesn't because I get the SIGILL).

This is strange as the function should only load the module into memory - not execute anything.

Does anybody have an idea or similar experience with dynamically loading code? Or knows about a difference in glibc or the Linux kernel between the 5500 and the C860?

Just to note: RTLD_NOW or RTLD_LAZY does not make any difference.

Any pointers are welcome.

-- hns
SL5500G, C860, C3100, WLAN, RTM8000, Powerbook G4, and others...
http://www.handheld-linux.com
http://www.quantum-step.com

dhns

  • Hero Member
  • *****
  • Posts: 699
    • View Profile
    • http://www.goldelico.com
dlopen() broken on C860 with ROM 1.4JP???
« Reply #1 on: October 27, 2004, 09:09:37 am »
Now,
I have traced town the issue a little further. It is the _init mechanism of dlopen(). The manual says that a procedure _init() is called - if it is present - right after loading the module.

I have tried to compile the loaded module with -nostartfiles according to http://www.linux.com/howtos/Program-Librar...ellaneous.shtml and then, the illegal instruction crash disappears - but the initialization does no longer work :-)

So I have added my own _init() which just prints a "here I am" - and this is called without crash.

If I then remove the -nostartfiles, the linker complains about a duplicate _init in crti.o - but I could not find the source code of that to understand what it does.

What still puzzles me is that it works when running within the control of gdb and why it works on the SL5500G with ROM 3.10.

-- hns
SL5500G, C860, C3100, WLAN, RTM8000, Powerbook G4, and others...
http://www.handheld-linux.com
http://www.quantum-step.com

dhns

  • Hero Member
  • *****
  • Posts: 699
    • View Profile
    • http://www.goldelico.com
dlopen() broken on C860 with ROM 1.4JP???
« Reply #2 on: December 01, 2004, 04:04:21 am »
No progress so far. Any hints are appreciated.

The situation is as follows:
* I can dlopen() many of the shared libraries
* one of them fails loading - even if it contains no code
* I could make it not to fail by commenting out parts of the code
* it never fails when the process is running under control of gdb
* it does not fail if I do not load other libraries before
* it never failed on the SL5500
* failure also depends on having fprintf() in the code that calls dlopen()

So, I have two hypothesis:

1. it has to do with the CPU cache that is not properly flushed by dlopen - would be a Linux kernel bug. Is there a system call to switch of the instruction cache on a C860?

2. some pointer is wrongly initialized and this depends on the code size (would not explain that it does not fail with gdb)


-- hns
« Last Edit: December 01, 2004, 04:09:20 am by dhns »
SL5500G, C860, C3100, WLAN, RTM8000, Powerbook G4, and others...
http://www.handheld-linux.com
http://www.quantum-step.com

dhns

  • Hero Member
  • *****
  • Posts: 699
    • View Profile
    • http://www.goldelico.com
dlopen() broken on C860 with ROM 1.4JP???
« Reply #3 on: December 02, 2004, 04:46:43 pm »
Now, I have had some progress (but hints are still appreciated!):

Code: [Select]
ulimit -c unlimitedmade the Zaurus C860 write a core dump. But gdb complains:

Code: [Select]
Xapp path: ./mySystemUIServer.app/Contents/Linux-ARM/mySystemUIServer
GNU gdb 5.2
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "armv4l-unknown-linux-gnu"...
(no debugging symbols found)...

warning: core file may not match specified executable file.
Core was generated by `'.
Program terminated with signal 4, Illegal instruction.

warning: wrong size fpregset struct in core file
Reading symbols from /home/myPDA/lib/libX11.so.6...
(no debugging symbols found)...done.
....
(gdb) bt
#0  0x40020034 in ?? ()
Cannot access memory at address 0x0
(gdb)

By the way, I have tried to link with -rdynamic as suggested in some descriptions of Linux's dlopen() - no difference.

-- hns
« Last Edit: December 02, 2004, 04:47:43 pm by dhns »
SL5500G, C860, C3100, WLAN, RTM8000, Powerbook G4, and others...
http://www.handheld-linux.com
http://www.quantum-step.com

dhns

  • Hero Member
  • *****
  • Posts: 699
    • View Profile
    • http://www.goldelico.com
dlopen() broken on C860 with ROM 1.4JP???
« Reply #4 on: December 05, 2004, 03:31:05 pm »
I have now traced down everything to some loadable moduels. One fails depending on its contents (i.e. if I compile in functions or if it is empty).

But now look at a strange command sequence ("run mySys*" calls a compiled application that just tries to dlopen "myClock", "myBattery", and "myWLAN"):

Code: [Select]
# run mySys*
starting
dlopen error("myClock"): (null)
dlopen error("myBattery"): (null)
Illegal instruction (core dumped)
# run mySys*
starting
dlopen error("myClock"): (null)
dlopen error("myBattery"): (null)
Illegal instruction (core dumped)
# run mySys*
starting
dlopen error("myClock"): (null)
dlopen error("myBattery"): (null)
Illegal instruction (core dumped)
# run mySys*
starting
Illegal instruction (core dumped)
# run mySys*
starting
Illegal instruction (core dumped)
# run mySys*
starting
Illegal instruction (core dumped)
# run mySys*
starting
Illegal instruction (core dumped)
# rm core
# rm core
rm: cannot remove `core': No such file or directory
# run mySys*
starting
Illegal instruction (core dumped)
# rm core
# run mySys*
starting
dlopen error("myClock"): (null)
dlopen error("myBattery"): (null)
Illegal instruction (core dumped)
# run mySys*
starting
Illegal instruction (core dumped)
#

It fails certainly on loading "myWLAN" but sometimes on "myClock" and works again after doing something else. So based on this trace I suspect that something is really wrong with the dynamic loader and/or the CPU caches. I know that the SL5600 has such a cache issue but the C860? Or do I have a bad memory cell?

Any kernel guru out there having an idea?

-- hns
SL5500G, C860, C3100, WLAN, RTM8000, Powerbook G4, and others...
http://www.handheld-linux.com
http://www.quantum-step.com

dhns

  • Hero Member
  • *****
  • Posts: 699
    • View Profile
    • http://www.goldelico.com
dlopen() broken on C860 with ROM 1.4JP???
« Reply #5 on: December 13, 2004, 01:23:55 pm »
FINALLY FOUND!

I have found the reason. It is not really a bug in dlopen() but
something I would consider a weakness.

It turned out that -fPIC was missing when linking the libraries that
later failed.

Why this crashes *sometimes* and not always is someting I don't really
understand. And there *might* or *should* be a flag within an
executable file that tells if it was compiled -fPIC or not. And
dlopen() *should* complain if it can't properly relocate such a
module.

Thanks to all who have given me support! This will bring a new release of mySTEP/myPDA for the C860 models soon.

-- hns
SL5500G, C860, C3100, WLAN, RTM8000, Powerbook G4, and others...
http://www.handheld-linux.com
http://www.quantum-step.com