Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Actually Portable Executables (ahgamut.github.io)
683 points by krab on Feb 28, 2021 | hide | past | favorite | 154 comments


Cosmopolitan Libc author here. When I saw this article earlier today, it put a big smile on my face, because I would have thought there'd be so many more issues than there turned out to be! For additional context, we've got a GitHub issue that's tracking progress on the changes that need to be made based on what we learned: https://github.com/jart/cosmopolitan/issues/61


So first of all, I just want to echo the general sentiment here and say that all of this is beyond awesome. Cosmopolitan libc seems to have the potential to literally re-define what constitutes portable code.

With that out of the way, am I understanding correctly that the way this works on Linux/Unix is that the modifies itself (by overwriting the EXE file header with an ELF header)? This seems to have the consequence of making that specific file no longer portable. If I'm understanding things correctly, it also looks like the QEMU hack for non-x86_64 architectures will only work once per file, since after the first time running the file it will no longer run as a shell script on Unix so the QEMU invocation will be unreachable.

Have you considered adding workarounds for this?


On unix its a Thompson shell script that pipes an ELF binary to 'exec', I believe


I'm the author of the blog post!

As someone who has never messed around with Libc-level programming, it was surprisingly straightforward (and exciting!) to compile Lua all the way.

Cosmopolitan is incredible, thank you so much.


Only your own creativity seems to match your intelligence; excellent, excellent work!

I was wondering if we can accompany runtimes written in C (like lua) with some scripts? So we have cross-platform, double-click-to-run scripting?

For us poor souls who can't write reliable C code, that would be a great thing!


Yes we can do that. There's a real opportunity here to define a novel API / framework on top of Cosmopolitan that does just that. It's something I myself have personally refrained from, mostly because fiddling with the low level bits is what I'm best at, and novel APIs require a certain kind of artist. But I'm certain that someone if not myself at some point is going to come along and use Cosmo for that purpose and it'll be pretty cool.


I'm an old Delphi/Pascal application programmer here, trying to get the lay of the land in the C/C++ world.

Why is there a forest of files in Cosmopolitan Libc? I tried looking at it on Github to see how things were done, and there were a lot of .h and .S files, I couldn't actually find the C source, though I'm sure somewhere in there is a .c file.

Doesn't fragmenting the source into thousands of files make compilation far slower than it needs to be?

Also, I wonder if/how functions that aren't called in a program get trimmed away by the linker, and thus don't make the executable larger.


It takes about 18 seconds to build the entire Cosmopolitan repo and run all its unit tests on my PC. During that time the make command builds 14,376 .o files, 66 .a archives, and 421 .com executables. It's an exceedingly fast build config.

Having lots of objects is a good thing because it helps static linking work better. When the Unix linker loads a symbol from a static archive, it pulls in the whole module that defines that symbol. For example, if you define memcpy() and memset() in the same .c file and then your app only needs one of them, they'll both go towards bloating your binary. Workarounds exist like -ffunction-sections and -Wl,--gc-sections but a C library should make assumptions about the fewest flags feasible.

What's the end result? We're able to build executables that are 12kb in size which run on seven different operating systems. The big codebase is what made tiny binaries possible: https://justine.lol/cosmopolitan/howfat.html


Do you build on tmpfs? I started doing that and it makes builds ridiculously fast. $XDG_RUNTIME_DIR is perfect for this, I just make a directory there and symlink the build directory to it. Even added some makefile logic to ensure the link target exists:

  build_directory_link := $(shell readlink $(build_directory))
  $(if $(build_directory_link),$(shell mkdir -p $(build_directory_link)))


Good optimization, thanks for the pointer!


It also reduces wear on storage devices. Why write temporary build artifacts to permanent memory after all? Gotta use those 32 GB of RAM for something.

Linux will automatically cache all source files in RAM because of the page table cache. Writes to slow storage devices are eliminated through tmpfs. Amazing really.


Also, if you are using BFD ld, you should consider gold or lld. Significantly faster if the link is the slow step.

Also much lower peak memory usage for lld, which is great if you have multiple concurrent link steps.


May I ask - what was the reason for creating APE/Cosmopolitan?

I read your post about actually portable executable format but I wonder if it's something that you found immediately helpful for some project or if you work on it just out of curiosity.


> Doesn't fragmenting the source into thousands of files make compilation far slower than it needs to be?

This allows incremental compilation to work much better than it otherwise would. Your first clean build may be slightly slower, but after that, you only have to recompile the files you change.


Compiling ASM and even C is very fast these days. No generics, small compilation units etc... Even with optims it's usually a non issue in my experience.

For a datapoint it takes me ~2 minutes to build a reasonably fully featured ARM linux kernel with a cold cache on a 5 year old i5. wc counts 1399 CC, 65 AS and 411 LD calls. And of course incremental rebuilds are only a fraction of that.


One question: is it possible to load .so files (through dlopen/dlsym) on Linux when compiling to APE?

I was working on something similar (though much smaller in scope), but had to stop when I realised that `ld-linux.so` has some internal APIs that Glibc uses to setup dlopen()/dlsym(); essentially meaning that it is very hard -if not impossible - to load any shared libraries if one does not link to Glibc.

What I wouldn't give for a liblinux, similar to NTDLL.


> What I wouldn't give for a liblinux, similar to NTDLL.

Hey I actually tried to make such a thing.

https://github.com/matheusmoreira/liblinux

It provides access to Linux system calls and process start up code that gets all the arguments, environment and auxiliary values. I have several examples of applications written in 100% freestanding C with zero dependencies other than this library:

https://github.com/matheusmoreira/liblinux/tree/master/examp...

It's a bit too low level but I actually planned to make my own ld-liblinux eventually. It currently works with static and dynamic linking. The ld-linux.so is able to link even though nothing else links against glibc. I didn't test dynamic loading though.

I stopped working on it because I found better solution for system calls on the Linux kernel repository:

https://github.com/torvalds/linux/blob/master/tools/include/...

Do you think liblinux could have a future?


> I found better solution for system calls

Yeah, I have seen that file. Thought last I checked, it crashes in certain conditions [1]; I didn't really go back to it to fix it.

> I actually planned to make my own ld-liblinux eventually

Can you share this code? If not, could you document how exactly would you have done this? I lost all motivation once I realized that ld-linux and Glibc communicate with private APIs, but I'd love to actually work on it if it was possible to do this.

> Do you think liblinux could have a future?

Absolutely. The problem with Linux currently is that the loader rejects programs at the sign of slightest incompatibility before even trying to execute them, which means that there is nothing that the application code could do to work around said incompatibility. If we were able to get our code running (or, as I like to say it, make %RIP point inside the main function), we could then do whatever hacky-bullshit was required make the software work. All I need is to get our code executing, and from that point on, I'll handle system compatibility.

[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95411


Not sure why the stack is unaligned there. The way I solved this in liblinux is to just pass the stack pointer to a C function and compute all the other pointers there. That way the compiler deals with the stack stuff.

  _start:
      xorq %rbp,%rbp    /* user code should mark the deepest stack frame
                         * by setting the frame pointer to zero
                         */

      movq %rsp,%rdi

      call liblinux_start

      movq %rax,%rdi
      movq $__NR_exit,%rax
      syscall

  static void *after(void *vector) {
      void **pointer = (void **) vector;
      while (*pointer++ != 0);
      return pointer;
  }

  int liblinux_start(void *stack_pointer) {
      long count;
      char *arguments;
      char *environment;
      struct auxiliary *values;

      count = *((long *) stack_pointer);
      arguments = ((char **) stack_pointer) + 1;
      environment = arguments + count + 1;
      values = after(environment);

      /* start is the main function */
      return start(count, arguments, environment, values);
  }
> Can you share this code? If not, could you document how exactly would you have done this?

Unfortunately I didn't make it that far. I was planning to study how glibc and musl do it just like how I studied their system call implementations. If I start working on this again I'm gonna look this up. I'll need to learn this linking stuff in order to support the kernel vDSO anyway.


It doesn't appear to be supported in their libc implementation:

https://github.com/jart/cosmopolitan/blob/d7ac16a9ed56ebdc70...


Yeah, as I feared. Some one needs to wrest the control of ld-linux.so from Glibc's hands and give it to the Kernel developers (who "will not break userspace"). Otherwise, Linux will always be a all-work-for-no-reward platform.


Looks like you're also giving up position independent code as that also relies on the dynamic linker.


Have you considered writing an index for Cosmopolitan's API by topic? The current reference documentation is rather daunting, and it is very difficult to determine at a glance what sorts of functionality are available, and thus what kinds of applications could be written or ported.

What would it take to create an analog of SDL- some kind of lowest-common-denominator interface for mouse/keyboard io, audio output, and software-rendered graphics?


Cosmopolitan is a C library that makes an effort to conform to POSIX and ANSI standards. So in addition to its documentation page, you can dust off any old textbook on C development and use that as a reference or tutorial.

Something like SDL could be ported to run on top of Cosmopolitan but we'd need to port all its dependencies too. That would be a massive undertaking. If Cosmopolitan ends up being the next big thing and attracts a large community of contributors then something like that is bound to happen. Right now it's just a scrappy ambitious project being built for fun. So GUIs aren't something we're able to do quite yet. Although we've got TUIs working great! You can have the best most portable console / terminal apps in the world, today.

Think about it this way. How many years did it take Microsoft and Linux before they could offer a polished GUI? That should give you a rough estimate of how long it takes to develop these sorts of things from first-principles.


Any recommendations to somebody who wants to learn C "properly" in $CURRENT_YEAR?


https://www.oreilly.com/library/view/21st-century-c/97814919...

I thought 21st Century C was good, i've still kept my copy. I'd happily recommend it.

I like the K&R book too - it feels reeeeeeally old but it's really short.

There's a few others that have helped me in various ways but these are a little older -

Love C by Tim Love (free online, my copy is something i just printed out, it's not that long).

Programming from the Ground Up (x86 assembly) - this is available freely online but i bought the book and that helped me a lot with C even though it's a book with only assembly... (to be fair, it does go through calling conventions).

Finally there's another book i love, Advanced Programming in the Unix Environment by Stevens, i have the 6th edition updated by Rago after Stevens' passing. Fascinating book - but huge.


I really liked "The C Programming Language (second edition)" by Kernighan & Ritchie. It doesn't teach C99 and later features, but if you want to write portable code you're not using those anyway.


From various discussions on Hn, I think the consensus is most people really like k&r as a technical book (it's concise, clear and a fun read) - but everyone working with C says its a terrible book for learning modern C.

I tried to refresh my memory from previous threads, and I think the general trend has been to recommend (as seen in sibling comments):

https://modernc.gforge.inria.fr/

https://nostarch.com/Effective_C

I've also seen the older:

https://www.oreilly.com/library/view/21st-century-c/97814493...

Mentioned.

There's also Architecture of Open Source Applications, which can help with starting to read some larger code bases, some of which are in C: http://aosabook.org/en/index.html

And there's a general recommendation to go read the source code of the Redis cache/db.

Finally i came across a mention of this short article on gdb (nb: mention of TUI text ui should probably have been in the top, not a foot note):

https://www.recurse.com/blog/5-learning-c-with-gdb

I feel I'm missing a book that has come up often, but can't think of which one.

I did like zed Shaws learn c the hard way, but I'm afraid it's getting a bit long in the tooth: https://learncodethehardway.org/c/


K&R is a horrible way to learn C. It's the best way to make someone run in the other direction, though.

It's like telling someone that SICP is the best way to learn Scheme. Maybe, but lots of people learn differently.

Unfortunately I don't have good suggestions myself, but at least they won't feel bad if K&R isn't their cup of tea.


I am just trying to learn myself so take this with a grain of salt but I have found the github of Antirez:(https://github.com/antirez) - the author of Redis - to be an amazing resource. The code is incredibly well documented and to my eyes pretty well written. A good place to start I think is Kilo:(https://github.com/antirez/kilo/blob/master/kilo.c) - a 1000 line text editor with no dependencies.


APE makes it viable for a cross platform C library with a cross platform C compiler to make cross platform binaries.

Besides I was in the same boat as you. I come from the world of JS/Python/Go. I even wrote https://github.com/Himujjal/ekon in pure C recently. The reason I thought C would be good is performance and portability. But I found it to be better to invest time in Rust rather than in C. C is a fantastic language but cross platform dependency management is difficult. Unit Testing is also difficult. There are solutions but not as efficient as Rust's ecosystem.

BTW I wish there could be a Cargo for every language.


> C is a fantastic language

Odd...I find that a very strange sentiment.

I thoroughly enjoy C, and have even written a compiler for it, but I don't think it's a well-designed language by any account.


I can't remember where I learned the basics, but if you aren't new to programming in general then I think there are only three important things you need to know: what is a pointer, what is a macro, and always free what you malloced. Reading a book or tutorial will be helpful for this. In my day it was K&R, no doubt there are more up-to-date ones now.

But the next step, which I think is far more important, is to look at the source code of tools that you actually use in real life. Things like cp, or wc, or head. You've probably used them for years without thinking about it, but they're all written in C. Don't look at the modern GNU versions just yet, since they can be packed full of complex functionality, start with micro implementations like Busybox or Toybox. Then look at some OSes that are known for super clean code like NetBSD or OpenBSD. In those OSes you have the benefit of being able to go back to the version that existed 25+ years ago, so you can read through the diffs to see how they adopted new features and found new ways to address C's biggest flaw/risk - memory exploits.

If you're interested in kernel programming, check the Linux and BSD sources, and lurk on the mailing lists for a while to see how people talk about the code. The review process tends to be a bit more brusque than you might be used to on Github, but it's often detailed and results in code that is of a relatively high quality, or at least a consistent standard. It's a great place to learn.


Try new books: effective c and the free modern c one. They actually teach the language and all the concepts.



Yup



K&R -> do some random programming problems -> small project -> learn ASM -> learn how to do a buffer overflow (read "Smashing the stack for fun and profit") -> from here find your own path.

The first three are just useful for getting the basics. The next two are for starting to learn low level stuff. You don't really understand C until you understand how it relates to lower level code.


$CURRENT_BEST_BOOK Sorry :)


C Primer Plus by Stephen Prata is the most complete i have seen, it does cover everything up to C11 and goes in details about stuff like array vs pointer.

I don't like 21st Century C, it may have changed since then but it has a chapter titled "Object-Oriented Programming in C" which is confusing object with Abstract data type.


I assume object oriented C means using structs and function pointers?


No. For 21st Century C, an "object" is an incomplete type and procedures.

    struct Object;
    void Object_init(struct Object *o);
That's ADT.


Not sure if it's been mentioned, but the logical next step would seem to be extending the zip hack[1] to allow embedding lua source with the binary?

[1] https://news.ycombinator.com/item?id=26271117


Congratulations on all you've done so far, it's amazing.


Copy-pasted from thread about libc:

I missed an opportunity to ask in the previous thread: what would it take to link an app in a different language (say, Rust) with this library? Is it enough to just build an object file, that has libc functions assuming LP64 ABI as unresolved exports?


Amazing accomplishment, a great hack!


In my opinion, the most important drawback to your work and its adoption is the use of fancy unicode characters in the name.

I think your work is outstanding but it looks a bit childish/bizarre with this name, and thus it might prevent some people to trust its reliability.


Naming Your Child | David Mitchell's Soapbox

https://www.youtube.com/watch?v=Xblh12XgQ4o

The first half is relevant.


The future is now, old man


you must mean ape. But the package itself doesn't seem to use any such characters.


I just saw this tweet - https://twitter.com/JustineTunney/status/1365805503561957376

Cosmopolitan is going to be a big thing.


>We've just created a 336kb Lua interpreter binary that runs on seven operating systems, thanks to Cosmopolitan Libc.

https://ahgamut.github.io/c/2021/02/27/ape-cosmo/


Awesome! But why no download link?


You can compile it yourself if you like. I created a fork of the Lua Github mirror and made the necessary changes to compile with Cosmopolitan:

https://github.com/ahgamut/lua/tree/cosmopolitan


Because as OP says it is not totally complete and lacks some parts.


Likely only if it’s forked away from anything to do with its current maintainer, for reasons only a Google search away:

http://valleywag.gawker.com/why-does-google-employ-a-pro-sla...

That’s the problem with trolling like that: once you do something cool that’s unrelated and try to move on it’s not clear if you’re genuine about anything any more. The history there will absolutely make larger interests hesitant to associate, contribute or integrate in something like, say, LLVM (regardless of the merit or lack thereof), particularly in today’s less forgiving culture.

Note that I’m not sharing an opinion on the content. It’s a metaobservation about where we are, for better or worse. That’s on page 1 of my Google search for “wow, who’s this amazing coder?”, so my path is the same anyone else would take doing due diligence on getting involved.


Thank you for sharing this, I personally am much less excited about the code now.

I took a look through her twitter, and her current opinions seem to be broadly similar to the ones she expressed in 2014.


> Likely only if it’s forked away from anything to do with its current maintainer,

And exactly why should I care about the current maintainer? Except maybe to say "thank you"?

Let's put the question upside down: what have you (or the gawker journalist) brought to me? I mean, besides your outraged opinions (which anyone is entitled to have!)

On one hand, you have a wonderful technical creation by someone asking nothing in return. On the other, you are providing... hatred towards the author? On a 2 hours old account?

I'm sorry if she breaks the kumbaya illusion, but people are not equal. Some can't escape the working class, or the welfare class - not due to any personal failure or limits, but due to societal indoctrination.

Some other people break free, and then bring gifts of the gods to mankind. They are called innovators, entrepreneurs - the name vary. But you know one when you see one. There is nothing is cosmopolitan that was "impossible" to achieve by anyone dedicated, even years before. There's nothing magic- except in the mind of the author, that associated the right pieces together.

> The history there will absolutely make larger interests hesitant to associate, contribute or integrate

If anything, your comment make me less likely to associate with you!

> particularly in today’s less forgiving culture.

I couldn't care less. The less forgiving culture is a problem for those who need to work ("will code for food" as they said during the dotcom crisis) or integrate with the rest of society. Unfortunately, that means most of the people here, especially those working for the FANGs.

But there are some of us who just don't care - except to lament that, when shown the way to escape, most geeks double down on their own mistakes, and remain chained to their FANGs masters.

So please go on attacking the author if that makes you happy, while the rest of us admire the creativity and new things made possible by cosmopolitan (and make money out of that too!)


If you don't care, that's okay. Other people are allowed to care and state their opinions.

You may want to notice that the author, Justine Tunney, was also chained to her "FANGs master" Google during the period described.


> Justine Tunney, was also chained to her "FANGs master" Google

And now she's free, which creates an even more compelling narrative: true creativity can only shine in freedom, not in chains


>Rather than inclusionary leftism.

Traditional left didn't give a shit on gender, race or whatever bullshit you think you should declare yourself "inferior" because of an excuse. Just work and equality.

I hate both poshy techno-aristocrats (I would kick them out to Somalia or a Mexico narco shithole, to see what they can do without social support or a state backed police), and SWJs with far more points in common with fascism (Fuck that so-called cultural appropiation) than the common worker.


I mean, you’re engaging with the espoused opinion of someone who simultaneously advocated class warfare and drew Google payroll, so you may as well be arguing with a figment of our collective imagination. If that’s not obvious by nature of how incongruous it is I fear for the Internet.


Previous discussion of Cosmpolitan: https://news.ycombinator.com/item?id=25556286


Can somebody explain in layman’s terms what this is all about? I tried my best reading the other thread, but it went well above me.


Windows, macOS, Linux, and BSD all (usually) run on Intel processors, so in theory it should be possible to write a program that runs on all three. However, despite ultimately using the same instruction sets, the four OSes use different formats to store metadata about the programs, and have different ways for programs to communicate with the operating system. Justine Tunney created two things:

1) Actually Portable Executables, a clever way of formatting a program so that all four OSes interpret it a valid program in their own format.

2) Cosmopolitan libc, a library for communicating with the OS that handles each OS's interface, allowing programs to work on all four.


Thank you (and thanks to everybody else for their helpful answers). Your note on intel processors meaning in theory programs should run on all operating systems was most helpful and got me across the line - now I get it!


That's pretty much it. Cosmopolitan Libc is the simplest project that could have happened decades ago. Anyone could have built it. There's never been a technical reason why an x86 binary can't run on all x86 operating systems. It's just that traditionally only operating systems publish C libraries, and they have no incentive to spend money supporting their competitors. So it's the kind of project that could only happen if it was done by an indie developer just trying to have fun.


Another reason is that it requires fairly thorough knowledge of 7 different programming environments. Most people don't understand 1.


The only programming environment I used to build this was Emacs on Linux. The rest was simply researching the magic numbers needed to use the binary interface of each system. I figured life's too short to be pulling my hair out with MSVC/Xcode warnings.


s/programming environments/systems/


Does cosmolibc allow you to run on bare metal? If so, I've got an old project in mind that you may have just made happen, and I'm very very excited to know!


"Actually Portable Executable"s can boot from BIOS if that's what you're wondering.

https://justine.lol/ape.html


Cool, thanks! I missed that before. It makes a project idea I had way more realistic for me to implement.


How does one unzip the .com file? Is that even possible or have i misunderstood how it works?


I recommend using InfoZIP on the Linux or Mac command line. For example, to list the files that are in your .com executables you can run a command like:

    $ wget https://justine.lol/redbean/redbean-2021-02-27.com
    $ unzip -vl redbean-2021-02-27.com
    Archive:  redbean-2021-02-27.com
     Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
    --------  ------  ------- ---- ---------- ----- --------  ----
         426  Defl:N      215  50% 02-09-2021 09:24 f9cc9464  tool/net/redbean.css
         896  Defl:N      509  43% 02-09-2021 09:24 eb74f8a0  tool/net/redbean.html
       16958  Defl:N     6093  64% 02-09-2021 09:24 6a76bd39  tool/net/redbean.ico
        5073  Stored     5073   0% 02-09-2021 09:24 86c2afe6  tool/net/redbean.png
         554  Defl:N      285  49% 02-09-2021 09:24 0aa6870a  usr/share/zoneinfo/Beijing
        2335  Defl:N     1093  53% 02-09-2021 09:24 6ccd4450  usr/share/zoneinfo/Berlin
        2453  Defl:N     1170  52% 02-09-2021 09:24 7fed6d80  usr/share/zoneinfo/Boulder
    ...
Another ZIP tool that works well with APE is Windows 10 file explorer. But you have to change the extension from .com to .zip temporarily.


What happens when intel goes bust and we all have to switch to new processors


A little optimistic about Apples SoCs aren't you?

Also...given that AMD exists and the x86 instruction set is not some magic potion ingredients...


They also run on AMD ;)


So it's like a compatibly layer like wine except built into the binary. Does this bloat the binary as well?


The only code that really needs to be duplicated would be the lowest level of userspace code that actually calls into the various operating systems. Things like the guts of "open", "read", "write", etc. Regardless of any particular OS's flavor, they're all going to roughly take in the same arguments and return the same results via some flavor of system call.

Each individual function is probably on the order of a few dozen machine instructions. Let's say there's 30 such functions that are implemented and 7 OSes to support. 36307 is about 7500 instructions. The average instruction length on x86-64 is 2-3 bytes. So maybe it ends up around 20KB of code space for a program that uses every possible operating system feature that's implemented?

I haven't looked at the implementation. I wonder how it selects between OS implementations. Does it switch on every system call, use a function pointer table, or do some sort of clever in-memory code rewriting?


Will this rush an optimized version of Electron or even deprecate it?


Well, the main problem with Electron is that every app ships its own browser using tremendous resources at runtime, having huge attack surfaces and lots of unused functionality. So just giving one optimized electron version to all users, certainly does not solve the problem as APE does nothing that changes these problems.

Speaking of deprecating it, it might be one scenario, but developers probably don't just choose Electron because it is platform independent. I mean, there were other frameworks that supported that use-case even before Electron (e.g. QT). However, what is specific to Electron, is that you can use web technologies to build Apps. But that is also nothing you can do with APE.

So as much as I like to see one binary format for many operating systems, I doubt that it will change anything regarding the Electron problem.

There certainly multiple ways of solving it, but in my opinion, Apple and Microsoft must come to a common ground and support some kind of web-view that is built-in to the OS and extended with a platform independent API for the things that normal browsers do not support/allow, like filesystems access.

At some point both supported the Progressive Web App (PWA) movement for a while, but afaik Apple is the one holding it back at the moment. My guess is that they saw their App-Store business at risk. And even if PWAs were a thing, they are still some use-cases they can not cover in the current state (e.g. missing filesystem access).


the main problem with electron is js


Technically, I guess you could use this to compile a universal browser binary that would run on all platforms. However, this wouldn't solve any of the problems caused by Electron.


I think his point was that with Cosmopolitan native binaries one would no longer need Electron to provide cross-platform binaries. There is a lack of good GUI frameworks to build this native Electron replacement with though, nor is it certain the effort would not be too high.


Or really any of the problems Electron attempts to solve; the GUI toolkits are still not (generally) not cross-platform in the way Electron is.


Note that packaged electron apps are not portable across OS, so it's orthogonal. What this could do is make the same electron package work on different OS without rebuild by using a portable browser binary.

Meanwhile Cosmopolitan doesn't provide a portable GUI toolkit so it can't replace electron on its own. Maybe if you could package Cosmopolitan + Qt or something like that you could end up with something very interesting though.


> What this could do is make the same electron package work on different OS without rebuild by using a portable browser binary.

Likewise for non-GUI apps, like command-line utilities that are implemented with NodeJS. As prior art, Fabrice Bellard's QuickJS is able to compile JS programs to spit out a native binary that requires no external qjs runtime. If you modified it to output Actual Portable Executable binaries instead, it would be even more portable than the programs installed with `npm install`.

I've done some experiments and progressed to writing tooling that approaches that I-want-ridiculous-portability problem differently. It doesn't use APE, so it still requires a separate interpreter (such as NodeJS) capable of executing the compiled output, but the object files it outputs are such that even if you don't have a copy of NodeJS installed on your system (or the right copy of NodeJS installed), then you can piggy back off the interpreter built in to your browser. You get a similar double-click-to-run experience, but when you do double-click to run, it launches in the browser, and you can go from object file to modifiable source trivially, since the object file and the Git repo from which it is built are automorphisms. The downside, if you consider it one, is that it doesn't work with arbitrary NPM modules. You have to really buy in to the scripting language dialect and the development philosophy to get anything out of it.

It would be interesting the use the QuickJS+APE strategy above so that there is _no_ reliance on an external interpreter, at the cost of marrying yourself to x86 and introducing some small hurdles in the path to modification and also giving up the safety of the browser sandbox, since you're back to running arbitrary executables.


The author somehow came up with an executable file format that's compatible with everything. It's simultaneously a valid shell script, Unix ELF, Windows PE file, boot sector code and zip file.

Then she wrote a standard C library that detects which OS you're using at runtime and branches to the appropriate implementation, allowing the same x86_64 code to run on any supported system.

  if (IsWindows())
      DoWindowsThing();
  else
      DoUnixThing();
Traditionally, these system differences are resolved at build time: only the code for the target platform is included. Why would anyone want Windows-specific code in a Linux build? It's dead code... right? The new run-anywhere executable format makes that code useful.


> Why would anyone want Windows-specific code in a Linux build? It's dead code...

You might want to ask the thousands of Electron App users how much they care about dead code ;-)

I mean, you are right. But if you offer your users no alternative they might not care enough to choose another product. In addition, some might even value it not having to choose the right binary for their system.


> In addition, some might even value it not having to choose the right binary for their system.

Yes. The true innovation that made all this possible is the novel Actually Portable Executable (APE) format. This means there's no need to choose: Windows will read it as a valid PE file while Linux, the BSDs and Mac will read it as a valid ELF program.

Having code for all operating systems in the executable has always been possible. It's just that before APE there used to be no point. Windows simply isn't able to load and execute an ELF program even if there is Windows code in it.

ELFs would be incompatible even between Linux, Mac and the BSDs since they would almost always depend on very different dynamic libraries. Cosmopolitan fixes this by providing a standard C library with support for everything. I assume incompatibilities can still be introduced by linking against other libraries.


Are you sure? Other answers are suggesting it isn't a valid ELF file and I don't get why it would need to be a valid boot sector or zip file.

You might be right though. The author might be very good at executable hacks but they're a terrible writer. Their explanation explains nothing.


The author seems to be an excellent writer. They just may not have taken the enormous amount of time to answer tons of questions or write tons of documentation people are seeking. One person did all this!


You really think so? Here's their description of how it works:

> Here's how it works:

>

> MZqFpD='

> BIOS BOOT SECTOR'

> exec 7<> $(command -v $0)

> printf '\177ELF...LINKER-ENCODED-FREEBSD-HEADER' >&7

> exec "$0" "$@"

> exec qemu-x86_64 "$0" "$@"

> exit 1

> REAL MODE...

> ELF SEGMENTS...

> OPENBSD NOTE...

> NETBSD NOTE...

> MACHO HEADERS...

> CODE AND DATA...

> ZIP DIRECTORY...

Ok well that clears it up. MZqFpD='. Of course!


Actually I'm not 100% sure. I can barely make sense of it. All I know is it somehow works as if it was a valid ELF. Some dark magic going on here...


No worries. You’re in good company. This actually portable executable thing is batshit insane (in the best possible way).


In short, there is a "framework" that allows to compile a single binary that can run under both Linux, Windows, bare metal, and many others. Like the Apple universal binaries, but actually universal, not just covering the m1's arm64 and x86 64.


It is x86 only, arm is not supported. But would be a very nice addition :)


non-x86 is emulated and bare-metal limits the portability.


You can compile a program on MacOS. The result, an executable binary, can run on Linux, Mac, Windows, FreeBSD, OpenBSD, NetBSD, BIOS .... the same EXACT binary!


It makes binaries that can run anywhere by shimming(?) libc functions and values with a custom loader/linker.


Something like this could be great for open source projects that target a lot of platforms. For example, Conda-Forge [0] has automated build pipelines for Linux, Mac and Windows. Perhaps something like this could let you reduce that to an x86 build and an Arm build... removing an entire dimension from the build matrix.

Gonna need to solve that 1/2=g problem before Numpy is happy though!

[0] https://conda-forge.org/


Maybe Cosmopolitan could be used with luastatic[1] to compile a Lua program to an "Actually Portable Executable":

  CC="" luastatic main.lua
  # Compile main.luastatic.c with Cosmopolitan Libc
[1] https://github.com/ers35/luastatic


I wonder if you could somehow get this working with libraries that are widely ported across platforms already like SDL.


It might take a bit of effort, but should be possible.

One problem for libraries like SDL is they disable all their per platform code with #ifdefs. To build a static, cross platform version you would have to convert them all those #ifdefs to runtime branching, dynamically load libraries and provide headers files for all platforms.


Could webassembly be also used for this?


Depends on what you mean by “using”. It is conceivably possible to compile a webassembly interpreter like wac[1] with cosmopolitan, which would then run on all OSes and bare metal, yes.

[1]: https://github.com/kanaka/wac


That's not as portable as you need an interpreter.


Portable Lua is a great idea. By using Cosmopilitan does this also run on baremetal machines? Portable micropython might also be a low hanging fruit target.


Does anyone know if there is any way to run another exe/binary that its bundled in a cross platform app, while its all enclosed in 1 binary file?

Because Cosmo could work as a wrapper of windows/mac/linux binary executables and then the cosmo app can decide which one to invoke.

Based on my understanding if program A is to run program B, program B needs to be addressable in the filesystem. So based on my testing, the bundled executables need to be written to disk first and then the main program could run them. This also might work with macos .app folder formats but not sure how to do it on linux to avoid chmod +x issues and how to have 1 binary file on windows.

Any ideas?


Huh?

> Cosmo could work as a wrapper of windows/mac/linux binary executables and then the cosmo app can decide which one to invoke.

It sounds like you misunderstand the reason for the project and that you're trying to add another layer of indirection to the thing that is already able to achieve the effect that you you seek to accomplish with that layer of indirection. No need for the indirection, just use the thing directly.

If you want to run $APP on platforms X, Y, and Z, then you'd use Cosmopolitan to build the $APP binary as an Actual Portable Executable, and you'd run that executable on any of those platforms.


When the source is available, then of course everything can be compiled in one nice executable. ;)

I am referring to cases where the source is not available or its hard to set up some projects together. Some projects (usually hardware related) require different compilers, compile options, etc. If the project already has the binaries, then for distribution maybe a wrapper could be a good option.


Would these binaries work on Apple Silicon? It seems like no, since it isn't x86? What are the limits to the portability, and is there a path past those limits?


That's right, it only supports x86. However the binary will detect other architectures and try to run itself with QEMU in that case.


I would love for someone to do this experiment.

In principle, Rosetta should detect that it's x86, perform the translation, and everything should "just work".

In practice, it's possible that the weird things APE is doing to get cross-platform compatibility will undermine one or more of Rosetta's assumptions.

I don't have an M1 Mac, so I'm not in a position to try it. But someone should!


Learned a lot from this and related pages, thanks a lot.

It is one of the best hack I've ever seen, ever. Although as others have pointed out I wouldn't consider it exactly elegant (IIUC basically shell script to morph it on first run on nix); Nor exactly ground breaking with the fat-binary-like concept, if you don't use platform specific libraries or calls anyway, cross compile works just fine? What's the use case here? I mean for platforms where performance or size overhead of vm-based languages is a problem, you probably don't want it to include extra code for other platforms anyway?

OTOH Blinkenlights looks really nice tho. I've see a dude live hand editing code page 417 with notepad.exe to change a jmp so it doesn't look for a CD with my own eyes ;)

Always admire hackers of this rank. I think Blinkenlights could help me achieve something similar someday if I allocate bandwidth to it, would spend lots of time to play with it.


I'm not any more or less excited about this from a utility point of view than I am about golang's support for embedding data in an executable.

It is pretty damn impressive, though. I'll give it that.

It's like that cat pushing a watermelon meme. This is a binary that runs on seven operating systems. Your argument is invalid.


Amazing accomplishment !

Not to detract from the technical breakthrough, but won't this be a major gift to virus writers ?


The hard part when making a virus is to exploit the target system to run said virus. And if you have such an exploit, then supplying a compatible binary depending on which OS the target system runs is easy.

Exploit attempts for embedded systems (routers, cameras, etc) typically start with attempting to execute code in a portable language such as Shell scripts (which would be the only thing this new project would replace), those scripts try to detect the architecture of the system and then download the appropriate binary (the server hosting the malicious binaries provides many variants for different architectures).

In the end I don't see this solving a real problem when it comes to malware - this problem has already been solved through other means.


Cosmopolitan Libc is designed to put power in your hands, and power can be used for good or bad. The best way for us to all keep that power, is to start using Cosmopolitan to do as many good deeds as possible.

This project is going to benefit developers on all platforms, because it supports everyone without bias. Indie developers are going to have more opportunities to be successful writing native apps, since Cosmo helps them reach a broader audience. Before Cosmopolitan only big companies could introduce new projects (e.g. TensorFlow) that effectively solve the portability problem, since the only way to do it before was brute force cash. Lastly, Cosmopolitan is going to relieve language authors of many of the portability burdens they've each needed to carry on their own, which means they're going to have more time to focus on their visions.

If we don't use Actually Portable Executable to accomplish grand acts of public service, then operating systems are just going to block it. For example, UPX is a project that does creative things with executable formats. If you read the XNU source code you'll notice that they have explicit source code for blocking those executables and they call out the project by name.


Just because an executable is cross-platform doesn't mean it can exploit issues on other platforms. Those issues have to actually exist there, and most vulnerabilities don't look like that.


The author mentions the portable binary is smaller than the original. It seems Cosmopolitan's output is actually an executable pkzip file: https://justine.lol/ape.html


That's how I read it the first time through, but on a closer look, they're not comparing a normal native binary to an APE binary. They're comparing an intermediate step of the APE build process to the final output.

`objcopy -S -O binary` does two things. One is to strip all the ELF headers and just dump the section contents into an output file. It looks like the input to this command is an executable within an executable, with the BIOS boot sector, ELF segments, Mach-O headers, etc. all embedded as section contents within an outer ELF file; objcopy then removes the outer shell. Clearly a necessary step, but it won't make much difference in binary size. However, the `-S` tells objcopy to strip debug information, which is included in the original since `-g` was passed to `gcc`. This is presumably responsible for almost all of the difference; debug info is notoriously huge. You can get a similar size reduction in a normal compilation by running `strip`.

Edit: But objcopy does not do anything fancy like compress the input. It looks like the pkzip part only contains associated files the binary might need (e.g. "hellojs.com" contains a JavaScript file and time zone files), not the executable code itself.


Portable Lus is a great idea. By using Cosmopilitan does this also run on baremetal machines?


It would be fantastic to see a byte-by-byte breakdown of the universal header.


The dream of "Write Once, Debug Everywhere" will never die


I wonder if there's a compatibility/sanity test suite for Lua? It would be nice to run this interpreter with real scripts to see how it compares.


I was trying to get something from Nim to build like this, but I probably need to understand linking a bit better.


This actually come as a surprise. Just to ask as a testing case -

can we have an actually portable (common) lisp/python interpreter with at least terminal interface?

Can the terminal have something like VT-100 or other basic interface? and

Can we even run javascript to do screen related items ... like https://github.com/d3/d3


Yes. She built a javascript interpreter with cosmopolitan. Since the executables are also zip files, the interpreter can even load its own sources from the binary itself.

https://github.com/jart/cosmopolitan/blob/master/examples/he...


Actually portable malware coming in 3,2,1,...


Cool hack but extremely fragile. It's actually pretty sad that people will consider using something like this in real projects.


I'm not sure what problem Cosmopolitan is trying to solve.

I think it's an interesting research project, but from a practical perspective - there are significantly better (and more mature) ways to achieve cross-platform compatibility that don't involve many of the compile-time and run-time hacks required to get Cosmopolitan to work.

"Actually Portable Executables" are not any more "portable" than a statically compiled .exe running natively on Windows, or using Wine[0] on a Linux, as an example.

[0] https://en.wikipedia.org/wiki/Wine_(software)


I'm sorry but WINE is the kind of interpreter that gives me enough time to take a coffee break as it loads solitaire.exe. I needed to use it a few weeks ago because I'm supporting someone who needs it for their CI environment. I discovered that APIs for things as fundamental as things like memory that Microsoft introduced back in 2006 hadn't been implemented in the WINE release Debian installed on my machine that was made in 2019. As far as I can tell the project wasn't funded for the longest time, which is unfortunate, because a once great tool then succumbed to tragedy of the commons.

That's why I hope Cosmopolitan can have a positive impact. Because interpreters are great but we shouldn't need to rely on them as much as we currently do. The only thing your Actually Portable Executables need is the canonical stock preferred kernel interfaces that have indefinite stability promises backed by the most powerful technology corporations on earth. No one got fired for depending on the IBM's of the world. Your software is going to have a long life with little maintenance and best of all it's going to be fast.


Wine is actually quite similar to this: It's not an interpreter or an emulator, it natively runs the executable on the host CPU. What it does is provide the WinAPI dynamic libraries that the programs expect (so they can call functions like CreateFile) and a loader that can load the exe files (PE) into memory and knows where to jump to.


Thanks, but I still can't tell from your response what problem Cosmopolitan is trying to solve.

I came across some of the following statements in your website (https://justine.lol/cosmopolitan/index.html):

> "Cosmopolitan makes C a build-once run-anywhere language, similar to Java, except it doesn't require interpreters or virtual machines be installed beforehand"

Glancing at the code so-far, Cosmopolitan is really just an alternative libc implementation. It's important we acknowledge that there's only so much you can do with a libc implementation: it's a paper-thin abstraction layer for a small subset of rudimentary APIs.

It's common for software today to depend on operating system interfaces (which are platform-specific), either directly (via static/dynamic linking with operating system libraries) or indirectly (delegating to a runtime environment such as Java, Python, .NET Core, Mono, Wine, and others). I don't see how Cosmopolitan caters for these scenarios, and I don't see why you would make that comparison in the first place.

> "Cosmo provides the same portability benefits as high-level languages like Go and Rust, but it doesn't invent a new language and you won't need to configure a CI system to build separate binaries for each operating system"

Cosmopolitan doesn't change anything about the C language itself, right? How portable a certain C project is depends entirely on the business logic within the code. Any C code using nothing but libc is pretty much guaranteed to be "portable" (it can be built using almost any combination of hardware, operating systems and compilers) - I'm not sure what Cosmopolitan contributes to the portability story?

> "What Cosmopolitan focuses on is fixing C by decoupling it from platforms"

C is already "decoupled from platforms", so I'm confused by this statement. There are a lot of projects out there that don't use libc at-all, or use their own custom abstractions on-top of platform-specific code (#ifdefs, fancy buildsystems, etc). There's nothing about Cosmopolitan itself that makes it somehow "decoupled from platforms": the decoupling happens by virtue of writing code that relies on the public interfaces exposed by libc, and the ability to swap libc at both compile- and runtimes.


HN has been obsessed with this project for the past few days, and that might be why you're getting so aggressively downvoted. Cosmopolitan is interesting from a technical perspective, but doesn't really bring anything new to the table.

Packaging multiple entry points in the same file is a clever hack but is extremely brittle (for example, archiving/re-archiving can mangle relevant sections which ZIP doesn't care about -- this will break your binary), ELF uses a pre-processing step which renders the binary non-portable once it's run, and the cross-platform libc stuff has been done since time eternal.

Not to mention that, as you say, just C is nigh useless these days, anyway. It's kind of funny to see people comparing this to Electron.


Some of these statements are very misleading too.

> Cosmopolitan makes C a build-once run-anywhere language, similar to Java, except it doesn't require interpreters or virtual machines be installed beforehand. Cosmo provides the same portability benefits as high-level languages like Go and Rust

That is simply not true. Java, Go and Rust can run programs at native speeds on non-x86-64 architectures, e.g. ARM, but Cosmopolitan cannot, not now and not ever because x86-64 is a terrible IR.


> Java, Go and Rust can run programs at native speeds on non-x86-64 architectures, e.g. ARM, but Cosmopolitan cannot

These things are not mutually exclusive. Nothing is stopping you from shipping a JVM, Go, or Rust app ontop of Cosmopolitan.


Actually Cosmopolitan doesn't support threads so Java, Go and Rust won't work at all on top of it. https://github.com/jart/cosmopolitan/blob/master/libc/runtim...


That's not the claim though. The claim is that Cosmopolitan offers the same portability benefits as Java, Go and Rust, not that Cosmopolitan + (Java, Go, Rust) offers the same portability benefits as (Java, Go, Rust).

Also the latter claim is not true AFAIK. For example if Cosmopolitan were to support all the architectures Rust supports, at native speeds, it would have to support packaging the compiled code for those different architectures into some kind of super-fat-binary. Currently it only supports x86-64.


For the record, I've tried running a recent game (made in Unity, I believe) on WINE a while ago and it worked like a charm. So I guess it's hit or miss?


I'm surprised by this. I'm using WINE to play League of Legends on Linux and it works surprisingly well.


Once an application is running it's reasonably fast but Wine has always taken endlessly time to start up, at least on my machine. It feels similar to Java VMs in that respect.


Ah yes I can attest to that part. It does take quite a long time to start.


You're thinking in the wrong direction. WINE can let Linux users run Windows binaries, but then as a programmer you need to write Windows-y C. What makes this appealing to me is that as a Windows user I can write UNIX-y C and it will just work.

Obviously Windows could also be supported by recompiling under Cygwin, or MSYS2, but those tools are challenging. Compiling code designed for a UNIX on Windows is a pain in the ass, and sometimes the end result has strange behaviors. Upstream projects tend not to care much about Windows support, so trying to fix it you're on your own.

Nowadays we have WSL so we can just run Linux binaries on Windows as-is, but that's still a fairly heavyweight solution.

Wouldn't it be better to just compile once on UNIX, link to a magical portability library, and now you have a binary that just works everywhere? I think so.


> Obviously Windows could also be supported by recompiling under Cygwin, or MSYS2

MSYS2 is a dream. You install it, specify your dependencies using Arch's pacman, and go to town. No fiddling with Windows configuration files or paths, it just works. In fact it has been easier to get new devs up and running in that environment than in Linux, where they tended to smear build-time dependencies across the bare metal install. Of course there are solutions for that, but those solutions are more of a pain to set up than MSYS2.


I'd like to see an MSYS2 compiled with cosmopolitan


The cool thing is that it wouldn't really be MSYS2. It would just be coreutils, or binutils.

The dream is that a programmer can just write their tool, using standard ISO C, maybe some POSIX, then compile and link it to Cosmopolitan and they're done. No need to set up any special configs or ifdefs. No need for any platform specific distributions. You just compile once and now you have the canonical version.


I think the major problem it solves is to not require multiple binaries to be distributed, one for each OS since them all runs on the same architecture. By distributing a single binary that really runs anywhere without requiring another solution, like a VM, an emulator or a compatibility layer, it simplifies the installation process, and allows the projects to be available for more users without increase the build complexity.


This is a theoretically more pure solution because you are trying to break down compatibility to its common factors, the processor, and then do as little work as possible to get it to the point of portability. Less abstraction than other solutions. The holy grail would be a generalised solution for any processor, using nothing but electrical engineering first principles, but I don't know if that's possible and maybe the better solution is to just not have the frankenstein's monster of hardware we have today.


I tried to get Wine to run a dotnet framework console app last week. It was the first time I've tried to use Wine, but it was way more complicated than I thought it was going to be - many magical incantations are needed, which vary by OS, hardware platform, OS platform and binary platform. And it's very slow. I spent about 4 hours trying to get it working, and after all that time, all I'd achieved was bombing on a Crypto API error.

I gave up.


Yes, as they require a significantly less amount of stuff you need to deal with. WINE needs to implement vast amounts of Windows APIs to make something like .net work, while cosmopolitan limits you to low-level C (and things built on it, like lua) at compile time.


> while cosmopolitan limits you to low-level C (and things built on it, like lua) at compile time

You can build very little with the C Standard Library alone, wouldn't you agree? Things like graphics, anything beyond the most rudimentary I/O, etc - are not a part of the C Standard Library.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: