Hacker News new | past | comments | ask | show | jobs | submit login

> immediately segfault somewhere during loading

Utterly terrifying. They somehow segfault before executing a single instruction in the program's entry point. Even the likes of gdb are rendered powerless before the might of this uber segfault. I was reduced to posting readelf dumps on stackoverflow. Mercifully people immediately spotted the problem (unsorted PT_LOAD segments).




This was one of my favorite StackOverflow debugging experiences: https://stackoverflow.com/a/12575044/1204143. A user posted that their code, `int main() { return 0; }`, was crashing with SIGFPE. I traced it to the fact that they had compiled the code with a fairly new GCC, but ran it on a machine with a very old libc - their old libc didn't understand `DT_GNU_HASH` and was trying to look symbols up in an empty `DT_HASH` (computing the bucket indices mod 0 - hence SIGFPE).

It's possible to debug these! If it's a segfault (as opposed to the kernel just refusing to load your file), it's usually because ld.so (the dynamic linker) has crashed, and you can debug ld.so explicitly (gdb ld-linux.so.2 ; run ./yourprog). With symbols it's usually feasible to identify the code in the dynamic linker that has crashed.


> Even the likes of gdb are rendered powerless before the might of this uber segfault.

It's quite easy to debug crashes in the dynamic linker if you use a more powerful debugger. For example there is a Graal based AMD64 VM [1] which can record an execution trace of the entire program run, including the dynamic linker, and then you can analyze the execution trace offline and see exactly what happened / what didn't happen or where the linker crashed and how it got there. In case you ever wondered what the kernel roughly does when loading an ELF file: look at the re-implementation in the ElfLoader class of that project.

[1] https://github.com/pekd/tracer


In my case it was a static freestanding nolibc program, there was no dynamic linker or ELF interpreter. :)

The shell's execve jumps directly to the entry point I provided. The execve itself was segfaulting somehow. I couldn't think of anything to do short of running this entire thing in a virtual machine and tracing the kernel itself to see which branch of the ELF loader I was ending up in.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: