Hacker News new | past | comments | ask | show | jobs | submit login

Also, don't forget LD_DEBUG for debugging your handmade ELFs:

    man 8 ld.so
Your advice is completely on point. I spent some time a while back hand-writing ELF files in GNU Assembler. Statically linked executables are, indeed, quite straightforward. That said, there are definitely some mysteries, like the difference between p_vaddr and p_paddr in the program headers. Also, the difference between segments and sections is never really explained in the main references.

Dynamically linked ELF executables are somewhat of a different beast, though. You need some decent familiarity with assembly and the broad strokes of program loading to make heads or tails of the details around relocation symbols.

When I was playing around with these things, I never succeeded at hand-writing x86-64 dynamically linked ELF to do anything other than immediately segfault somewhere during loading. Maybe I should give it another try.

Not to far back, I spent some time decompiling (small) static ELF binaries by hand. That really hammered in the x86-64 ISA and its subtleties. I also have Levine's "Linkers and Loaders" sitting on my shelf, which I hope to get around to reading sooner than later.




> immediately segfault somewhere during loading

Utterly terrifying. They somehow segfault before executing a single instruction in the program's entry point. Even the likes of gdb are rendered powerless before the might of this uber segfault. I was reduced to posting readelf dumps on stackoverflow. Mercifully people immediately spotted the problem (unsorted PT_LOAD segments).


This was one of my favorite StackOverflow debugging experiences: https://stackoverflow.com/a/12575044/1204143. A user posted that their code, `int main() { return 0; }`, was crashing with SIGFPE. I traced it to the fact that they had compiled the code with a fairly new GCC, but ran it on a machine with a very old libc - their old libc didn't understand `DT_GNU_HASH` and was trying to look symbols up in an empty `DT_HASH` (computing the bucket indices mod 0 - hence SIGFPE).

It's possible to debug these! If it's a segfault (as opposed to the kernel just refusing to load your file), it's usually because ld.so (the dynamic linker) has crashed, and you can debug ld.so explicitly (gdb ld-linux.so.2 ; run ./yourprog). With symbols it's usually feasible to identify the code in the dynamic linker that has crashed.


> Even the likes of gdb are rendered powerless before the might of this uber segfault.

It's quite easy to debug crashes in the dynamic linker if you use a more powerful debugger. For example there is a Graal based AMD64 VM [1] which can record an execution trace of the entire program run, including the dynamic linker, and then you can analyze the execution trace offline and see exactly what happened / what didn't happen or where the linker crashed and how it got there. In case you ever wondered what the kernel roughly does when loading an ELF file: look at the re-implementation in the ElfLoader class of that project.

[1] https://github.com/pekd/tracer


In my case it was a static freestanding nolibc program, there was no dynamic linker or ELF interpreter. :)

The shell's execve jumps directly to the entry point I provided. The execve itself was segfaulting somehow. I couldn't think of anything to do short of running this entire thing in a virtual machine and tracing the kernel itself to see which branch of the ELF loader I was ending up in.


> like the difference between p_vaddr and p_paddr in the program headers.

I see it being used pretty much exclusively in embedded systems, and it looks like some of them set p_paddr to an address but leave p_vaddr to 0. So, it's only seeming use is to indicate when virtual memory is not expected to be used, but otherwise it's function is identical to p_vaddr. "Put the segment in memory here, please."




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: