Python one-file-packagers such as PyInstaller typically use the technique of sho...

loeg · on Jan 4, 2024

> so you just point `sys.path` at the executable and voila, self-contained Python in a single binary.

You probably know this, but for context for others, the reason this works well for zip in particular is that zip's top level header is at the end of the file with pointers to headers earlier in the file. It doesn't expect the top-level header to be at offset zero.

ithkuil · on Jan 4, 2024

Nice. But the beauty of GP's approach is that you don't need to issue another open syscall to open argv[0] in order to access (read or mmap) your code but instead just piggy back on what the kernel just did for your entry point anyway.

Except for the beauty of the approach, I'm not sure what the practical advantages are. Are there cases where a process wouldn't have permissions to access its own executable or argv[0] value is unreliable? Can you exec on a file descriptor of a deleted file?

EDIT: or would /proc/self/exe always point to something the process could open?

nneonneo · on Jan 4, 2024

I believe the process always holds the binary open as a "txt" file descriptor (check lsof), so opening `/proc/self/exe` always works.

You can even execve a "memfd", an in-memory "file" which is not in the filesystem (distinct from a ramdisk file, which is a file sitting on an in-memory filesystem). /proc/self/exe still works even in that case, even when the original memfd is closed.

Note that argv[0] can never be relied on 100%. The vast majority of programs will set it correctly (especially since many programs will malfunction if provided bogus argv[0]), but a caller has full control over argv[0] and can set it to anything, including a NULL pointer (by simply passing an empty argv array).

loeg · on Jan 4, 2024

> I believe the process always holds the binary open as a "txt" file descriptor

To be clear, the kernel has an association between the process and the "txt" file (because it is mmaped in), but this is not an application file descriptor (like 0, 1, 2, ...). If an application wants to read from it, and it isn't already mapped by a LOAD section, it needs to open() a real file descriptor.

ithkuil · on Jan 4, 2024

There is the possibility /proc is not mounted, right?

matheusmoreira · on Jan 4, 2024

Absolutely. It's certainly not mounted automatically after Linux boots and depending on the system's configuration it might never get mounted at all. Maybe it could even use some other path.

One of my long term goals with the programming language I posted is to boot Linux directly into the interpreter and bring up the entire system from inside it. Not only will /proc not be mounted, my program's gonna be the one that mounts it. So I decided to avoid using tricks like reading /proc/self/exe.

loeg · on Jan 4, 2024

The benefit of GP's approach is mostly just elegance. You can reliably introspect the (mapped) program headers with getauxval AT_PHDR.

matheusmoreira · on Jan 4, 2024

> Are there cases where a process wouldn't have permissions to access its own executable

Yes. Permissions might have changed after execution has begun. The file might even have been removed. This creates a race condition.

> argv[0] value is unreliable?

It is. The program calling execve has complete control over the arguments and environment of the program being spawned. It could set argv[0] to anything, including the null pointer or the empty string.

Last year I sent a patch to GNU coreutils that would let env set the argv[0] of programs. My purpose was to use env to test this exact edge case.

https://lists.gnu.org/archive/html/coreutils/2023-03/msg0000...

https://lists.gnu.org/archive/html/coreutils/2023-08/msg0006...

They said they were going to consider it. As of today, the feature has not yet made it in.

> Can you exec on a file descriptor of a deleted file?

Not sure. I assume it would cause the system call to fail.

> or would /proc/self/exe always point to something the process could open?

Not always. According to the manual there's some complexity involved:

https://www.man7.org/linux/man-pages/man5/proc.5.html

> If the pathname has been unlinked,

> the symbolic link will contain the string ' (deleted)'

> appended to the original pathname.

It's not 100% clear to me if opening and reading the executable will still succeed in that case. I assume it wouldn't work because the manual says it's just a symbolic link to the executable which will become a dangling link if the file it points to is deleted.

There's more: permissions to read the link can be revoked, the link is invalidated if the main thread ever exits, it has a completely different format in old Linux versions...

The ELF segment approach just ignores everything in this comment by getting Linux to mmap the data in just like the program text and data sections. The data will be ready before the program even runs.