If you want to pull back the curtain, I highly recommend hand-writing your own small ELF binary. In particular, on Linux, the C ELF structures are available via:
#include <elf.h>
Writing C code to generate an ELF makes it apparent that an ELF is just a couple of structs and some assembled code dumped to a file. (I've used Keystone with decent success for assembly.) It's actually pretty easy to build something that works if you follow along with the man page:
man 5 elf
For debugging handmade ELF files, it's handy to explicitly run the system loader under strace:
strace /lib/ld-linux.so.3 ./homemade_elf
You can find the path to the interpreter that will be used via something like:
readelf -a "$(which ls)" | grep -i interpreter
For example, debugging with strace will make it apparent if any memory mappings are failing. The loader also sometimes has its own error messages that are more descriptive than a normal segfault.
Also, don't forget LD_DEBUG for debugging your handmade ELFs:
man 8 ld.so
Your advice is completely on point. I spent some time a while back hand-writing ELF files in GNU Assembler. Statically linked executables are, indeed, quite straightforward. That said, there are definitely some mysteries, like the difference between p_vaddr and p_paddr in the program headers. Also, the difference between segments and sections is never really explained in the main references.
Dynamically linked ELF executables are somewhat of a different beast, though. You need some decent familiarity with assembly and the broad strokes of program loading to make heads or tails of the details around relocation symbols.
When I was playing around with these things, I never succeeded at hand-writing x86-64 dynamically linked ELF to do anything other than immediately segfault somewhere during loading. Maybe I should give it another try.
Not to far back, I spent some time decompiling (small) static ELF binaries by hand. That really hammered in the x86-64 ISA and its subtleties. I also have Levine's "Linkers and Loaders" sitting on my shelf, which I hope to get around to reading sooner than later.
Utterly terrifying. They somehow segfault before executing a single instruction in the program's entry point. Even the likes of gdb are rendered powerless before the might of this uber segfault. I was reduced to posting readelf dumps on stackoverflow. Mercifully people immediately spotted the problem (unsorted PT_LOAD segments).
This was one of my favorite StackOverflow debugging experiences: https://stackoverflow.com/a/12575044/1204143. A user posted that their code, `int main() { return 0; }`, was crashing with SIGFPE. I traced it to the fact that they had compiled the code with a fairly new GCC, but ran it on a machine with a very old libc - their old libc didn't understand `DT_GNU_HASH` and was trying to look symbols up in an empty `DT_HASH` (computing the bucket indices mod 0 - hence SIGFPE).
It's possible to debug these! If it's a segfault (as opposed to the kernel just refusing to load your file), it's usually because ld.so (the dynamic linker) has crashed, and you can debug ld.so explicitly (gdb ld-linux.so.2 ; run ./yourprog). With symbols it's usually feasible to identify the code in the dynamic linker that has crashed.
> Even the likes of gdb are rendered powerless before the might of this uber segfault.
It's quite easy to debug crashes in the dynamic linker if you use a more powerful debugger. For example there is a Graal based AMD64 VM [1] which can record an execution trace of the entire program run, including the dynamic linker, and then you can analyze the execution trace offline and see exactly what happened / what didn't happen or where the linker crashed and how it got there. In case you ever wondered what the kernel roughly does when loading an ELF file: look at the re-implementation in the ElfLoader class of that project.
In my case it was a static freestanding nolibc program, there was no dynamic linker or ELF interpreter. :)
The shell's execve jumps directly to the entry point I provided. The execve itself was segfaulting somehow. I couldn't think of anything to do short of running this entire thing in a virtual machine and tracing the kernel itself to see which branch of the ELF loader I was ending up in.
> like the difference between p_vaddr and p_paddr in the program headers.
I see it being used pretty much exclusively in embedded systems, and it looks like some of them set p_paddr to an address but leave p_vaddr to 0. So, it's only seeming use is to indicate when virtual memory is not expected to be used, but otherwise it's function is identical to p_vaddr. "Put the segment in memory here, please."