Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Code golf is anecdotal, at least, evidence that proving lower bounds on the Kolgomorov complexity of a particular datum requires intelligence.


I don’t dislike this challenge, and I don’t dislike code golf. What I don’t understand is the notion that compressing a 100MB Wikipedia file somehow requires an “intelligent” compression algorithm. If the challenge was to be able to compress any English text down to a certain ratio, then perhaps making the algorithm behave “intelligently” would be a requirement. But as far as I can tell, this challenge is asking for a lossless reproduction of one specific chunk of data, so it seems to me that we just have to hope that, on a given platform, there exists a 15MB executable that outputs that exact chunk of data. One might exist, but one might not. And you probably won’t even be able to prove the latter, because of the halting problem (technically it doesn’t apply with a computer with finite memory, but in practice no one will wait long enough).


> one specific chunk of data

Not just any chunk though. It (presumably) contains a lot of structure that encodes information about [English representations of] the real world.

It's not random, nor is it simple, nor is it an image of a fractal (maybe?) etc.

A system that is able to compress it well makes good predictions about what the data contain which indicates "intelligence" about the structure of the (encoded) world.


But would you expect that program to itself be useful for anything else? It literally just takes no input and outputs that one specific Wikipedia dump. It’s not like I can tweak it slightly and have it output new predictions of things that sound like Wikipedia articles.

Unless the idea is that whatever methodology is required to build this particular program could then be adapted to do other things that seem intelligent. That seems possible, but not at all likely given my reading of the challenge requirements.


See the links in my other comment in the thread for all the gory details.

> But would you expect that program to itself be useful for anything else?

That exact program, no, but the process of creating that program would be educational, both the compressor that makes it and the process of making the compressor.

Another way of putting it is in terms of the "Twenty Questions" game. Which pre-selected twenty questions allows you to "cover" the most of the Universe of phenomenon? If one set of 20 allows you to distinguish more of the Universe than some other set, then the first set is somehow a better model of the Universe. Does that help?


> If one set of 20 allows you to distinguish more of the Universe than some other set, then the first set is somehow a better model of the Universe.

Is Akinator’s 20q database available as a dataset for non-commercial research purposes? It’d be beat to attempt to compress it in a way where training with one half predicts the other half at the smallest trained model size.


The point of the challenge is to actually create a model that fits in 15MB together with its compressed data. Not just to prove its existence.

One of the best contenders is the open source cmix which, among other tricks, uses LSTM neural networks to predict the next character/bit: http://www.byronknoll.com/cmix.html


Seems like cmix compresses better than the winners, but uses far too much memory to be eligible for the prize.


I had to search out the size of the decompressor on a different page, but it proves you right. The compressed file is ~400KB smaller than the record holder and the cmix decompressor is only ~200KB.

I wonder what the memory/speed/efficiency tradeoffs looks like for that kind of design.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: