The author, Marcus Hutter, created the "Hutter Prize" [1] that offers a C50,000 prize to persons who can compress 100Mb of a Wikipedia file to under 15Mb.
From the prize site:
This compression contest is motivated by the fact
that being able to compress well is closely
related to acting intelligently, thus reducing
the slippery concept of intelligence to hard file
size numbers.
Hmm, the prize is defined to be at maximum €50,000. The formula is 50,000€×(1-S/L), where S is the compressed size + code of your program and L is the size of the current best entry. So if you are able to produce 5% smaller compression than the next best guy, you get 2,500€. The whole 50,000€ is only possible with infinite compression ratio.
I might be missing something here. Where does intelligence come in? On any given deterministic computer platform, there is a finite number of 15MB executables. And there is a finite number of 100MB text files. But the second number is much larger than the first, and thus not every 100MB text file has a 15MB executable that outputs it.
Of course, maybe we can find a 15MB executable that outputs the specific Wikipedia file he’s asking for, and there might be clever ways to search for or construct that executable on a specific computer platform, but it doesn’t strike me as some particularly generalizable sense of compression requiring intelligence.
He has a book on "Universal Artificial Intelligence" , that defines what he considers AI and then solves it. The only catch is that it relies on Kolmogorov complexity, which is incomputable.
The Kolmogorov complexity of a string is (loosely) the size of the smallest possible Turing machine that will output that string when run. So a string with 1000x "a" has a lower complexity than a random string of length 1000.
He sees GAI as a system that, given all its input up to now (from sensors and the like) and some value-function that defines the goal we're trying to reach, takes the action that has the highest expected value of the value-function. It's hard to give a better definition of intelligence in a completely general sense: given all input, choose the most likely best action.
That requires some guessing about the future: given all this input so far, for all possible next inputs in the next time step, what is the likelihood of each.
And his answer is: the likelihood is correlated with the Kolmogorov complexity of the sequence of all inputs so far, plus the possible future input we're currently considering. The idea is that there is information in the inputs so far, and the sequence that best uses that has the lowest Kolmogorov complexity and is most likely to occur. If we've, seen 999 "a" s, we expect another one. If we've seen a car-shaped object slide left to right on previous camera images, we expect it to continue its movement. If the input so far is completely random then every possible future is equally likely. Et cetera.
Given that, it becomes possible to calculate the value function for every combination of possible action and possible next sensor input, and the AI can know what to do.
Except of course for the detail that this is (wildly) incomputable.
But what is somewhat correlated to Kolmogorov complexity? Compressibility.
So that is why Hutter is interested in better compression algorithms.
All this is very loosely stated from memory, I was interested in this stuff over a decade ago...
I got a wife and kids so have no free time anymore, and nowadays I feel there are far more important problems than AI in the world (climate change, huge loss of biodiversity, cheap very effective propaganda).
I don’t dislike this challenge, and I don’t dislike code golf. What I don’t understand is the notion that compressing a 100MB Wikipedia file somehow requires an “intelligent” compression algorithm. If the challenge was to be able to compress any English text down to a certain ratio, then perhaps making the algorithm behave “intelligently” would be a requirement. But as far as I can tell, this challenge is asking for a lossless reproduction of one specific chunk of data, so it seems to me that we just have to hope that, on a given platform, there exists a 15MB executable that outputs that exact chunk of data. One might exist, but one might not. And you probably won’t even be able to prove the latter, because of the halting problem (technically it doesn’t apply with a computer with finite memory, but in practice no one will wait long enough).
Not just any chunk though. It (presumably) contains a lot of structure that encodes information about [English representations of] the real world.
It's not random, nor is it simple, nor is it an image of a fractal (maybe?) etc.
A system that is able to compress it well makes good predictions about what the data contain which indicates "intelligence" about the structure of the (encoded) world.
But would you expect that program to itself be useful for anything else? It literally just takes no input and outputs that one specific Wikipedia dump. It’s not like I can tweak it slightly and have it output new predictions of things that sound like Wikipedia articles.
Unless the idea is that whatever methodology is required to build this particular program could then be adapted to do other things that seem intelligent. That seems possible, but not at all likely given my reading of the challenge requirements.
See the links in my other comment in the thread for all the gory details.
> But would you expect that program to itself be useful for anything else?
That exact program, no, but the process of creating that program would be educational, both the compressor that makes it and the process of making the compressor.
Another way of putting it is in terms of the "Twenty Questions" game. Which pre-selected twenty questions allows you to "cover" the most of the Universe of phenomenon? If one set of 20 allows you to distinguish more of the Universe than some other set, then the first set is somehow a better model of the Universe. Does that help?
> If one set of 20 allows you to distinguish more of the Universe than some other set, then the first set is somehow a better model of the Universe.
Is Akinator’s 20q database available as a dataset for non-commercial research purposes? It’d be beat to attempt to compress it in a way where training with one half predicts the other half at the smallest trained model size.
The point of the challenge is to actually create a model that fits in 15MB together with its compressed data. Not just to prove its existence.
One of the best contenders is the open source cmix which, among other tricks, uses LSTM neural networks to predict the next character/bit: http://www.byronknoll.com/cmix.html
I had to search out the size of the decompressor on a different page, but it proves you right. The compressed file is ~400KB smaller than the record holder and the cmix decompressor is only ~200KB.
I wonder what the memory/speed/efficiency tradeoffs looks like for that kind of design.
It's related to intelligence in the sense that being able to predict well requires intelligence and helps compression. If you have an intelligent model that predicts new words with high confidence, then you can use fewer bits to encode them. If I were to be even more handwavy, I would say having the ability to summarize effectively is a sign of intelligence.
There is a much smaller set of useful 100Mb text files than there is of total possible ones. If there's only 15 million such useful 100Mb text files we should be able to map them to 15Mb executables.
From the prize site:
[1] http://www.hutter1.net/prize/index.htm