Hacker News new | past | comments | ask | show | jobs | submit | ahgamut's comments login

most likely before v2 as well. The original post is from July 2021, which is a bit after Cosmopolitan Libc's v1 IIRC.


Yes, you're correct. I had in mind the 2022 date of the update. It also looks like you've updated the article again today! Thanks


My 4-core computer's probably building Cosmopolitan Libc in the background + the Python binary is 2.7 unoptimized. This blog post is originally from 2021.


I've tried nuitka before, and a recent question that occurred to me was: does nuitka have an option to output just C files? Something like:

  python -m nuitka example.py --no-compile
Might be interesting to see if the above is possible. We could get things like a completely-statically-compiled Python stdlib within the APE.


Hey, this is my post from 2021, when I was testing Python2.7 and Python3.6 with Cosmopolitan Libc on an old 4-thread Haswell. It's now a lot easier to build Python (and gcc, gnu coreutils, curl etc.), and the binaries are faster, multi-threaded, and quite convenient to use. There are lots of interesting directions to explore when it comes to building software with Cosmopolitan Libc.


Thanks for updating the blog post too with the datasette screen recording, the speed difference is quite noticeable.

Adding it here since a few people were wondering about it in the comments, but feel free to check the original article for the 2024 update:

https://ahgamut.github.io/images/ape-datasette.gif


> new methods of communication with the compiler have been established.

From what I understand, this appears to a be separate binary from GCC/Clang that does static analysis and outputs C99.

Can this be a GCC plugin? I know we can write plugins that are activated when a specific macro is provided, and the GCC plugin event list allows intercepting the AST at every function declaration/definition. Unless you're rewriting the AST substantially, I feel this could be a compiler plugin. I'd like to know a bit more about what kinds of AST transformations/checks are run as part of Cake.


Cake is a C23 front end, but it can also be used as a static analysis tool. The qualifiers can be empty macros then the same code can be compiled with gcc , clang and the static analysis of ownership can be using cake.

Inside visual studio for instance, we can have on external tools

C:\Program Files (x86)\cake\cake.exe $(ItemPath) -msvc-output -no-output -analyze -nullchecks

The main annotations are qualifiers (similar to const). C23 attributes were considered instead of qualifiers, but qualifiers have better integration with the type system. In any case, macros are used to be declared as empty when necessary.

The qualifiers and the rules can be applied to any compiler. Something harder to specify (but not impossible) is the flow analysis.

Sample of rule for compilers.

int * owner a; int * b; a = b;

we cannot assign view to an owner object. this kind of rule does not require flow analysis.


> Instead, many have deeply underestimated LLMs, saying that after all they were nothing more than somewhat advanced Markov chains, capable, at most, of regurgitating extremely limited variations of what they had seen in the training set. Then this notion of the parrot, in the face of evidence, was almost universally retracted.

I'd like to see this evidence, and by that I don't mean someone just writing a blog post or tweeting "hey I asked an LLM to do this, and wow". Is there a numerical measurement, like training loss or perplexity, that quantifies "outside the training set"? Otherwise, I find it difficult to take statements like the above seriously.

LLMs can do some interesting things with text, no doubt. But these models are trained on terabytes of data. Can you really guarantee "there is no part of my query that is in the training set, not even reworded"? Perhaps we can grep through the training set every time one of these claims are made.


Exactly. I think that it’s very hard for us to comprehend just how much is out there on the internet.

The perfect example of that is the tikz unicorn in the Sparks paper. Seemed like a unique task, until someone found a tikz unicorn in an obscure website.

There is plenty of evidence that LLMs struggle as you move out of distribution. Which makes perfect sense as long as you stop trying to attribute what they’re doing to magic.

This doesn’t mean they’re not useful, of course. But it means that we should should be skeptical about wild capability claims until we have better evidence than a tweet, as you put it.


They didn't actually find a unicorn ; they found other tikz animals. It still generalized to the unicorn.

This was the package: https://ctan.org/pkg/tikzlings?lang=en


>Can you really guarantee "there is no part of my query that is in the training set, not even reworded"?

I mean..yes?

Multi digit arithmetic, translation, summarization. There are many tasks where this is trivial.


What packages do you use in your app? If you're not using too many C extensions, it may be possible to build your app with Cosmopolitan.


Lots of C extensions. Has anyone gotten PyTorch working with Cosmopolitan? With CUDA?


Try it out! We got Lua/LuaJIT, Python, PHP, and Rust building (the latter two are not fully automatic yet), so Ruby might be possible even now.


While I have no experience building ruby, if standard ruby gives you trouble - might be worth to try mruby - it is likely easier to build and debug the build process.


Counterintuitively, having messed around with building mruby for a project a few months ago, I found it harder to deal with. It's a lot more hardcore, with vastly fewer resources and more sharp edges, and is orders of magnitude less popular, so you find yourself hitting basic problems you would think would have been solved years ago.

I'd stay away from it unless you have a very pressing need.


https://github.com/ahgamut/rust-ape-example

My above repo contains example with the Rust standard library that build as fat executables with Cosmopolitan Libc.

I also got ripgrep to build https://github.com/ahgamut/ripgrep/tree/cosmopolitan, but it wasn't part of the cosmocc binaries because some crates that are part of the build require `#!feature(rustc_private)]` to be added to their source code. Same goes for bat and fd.

To summarize, Rust CLI tools can be built with Cosmopolitan Libc, but to build them like we can build gcc/emacs/git right now, we will need some support and Rust expertise.


Last time I looked into Rust-on-Cosmopolitan, I couldn't make it work as Cosmopolitan determined syscall numbers, signal numbers, and such at runtime, but Rust standard library assumes compile-time constants (basically the same issue that C switch-case has). How did you get around that? Have you tested these binaries on non-Linux platforms?


We have about ~700 test executables and two programs called runit/runitd which remote execute them across our entire test fleet each time we run `make test`. Currently, the fleet consists of `freebsd rhel7 xnu win10 openbsd netbsd pi silicon`. We used to have rhel5 and win7 in there too, but I've been slacking off the past few months. You can watch a video of how running all the test executables on all the systems only takes about fifteen seconds. https://justine.lol/sizetricks/#why


To clarify, are any of those ~700 executables written in Rust?


Yes, so pip works because the Python APE has OpenSSL built-in.

pip install requires modifying the APE, so I end up installing pure-Python libraries as follows:

    mkdir Lib
    ./python -m pip download httpx
    unzip ./*.whl -d ./Lib
    mv python python.com # in case the next step doesn't work
    zip -r ./python.com Lib
    mv python.com python
Installing CPython extensions like this is an unsolved problem, but I think there might be some interesting workarounds possible.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: