There's a section in the link above "Further Recommended Technical Reading relevant to the Compression=AI Paradigm" and they define it in a reasonably precise mathematical way. It's well accepted at this point. If you can take input, predict what will happen given some options you can direct towards a certain goal. This ability to direct towards a goal effectively defines AGI. "Make paperclips" and the AI observes the world, what decisions needed to be made to optimize for output paperclips and then starts taking decisions to output paperclips is essentially what we mean by AGI and prediction is a piece of this.
I have no stake in this btw, I've just had a crack at the above challenge in my younger days. I failed but i want to get back into it. In theory a small LLM model without any existing training data (for size) that trains itself on the input as it passes predictions to an arithmetic coder that optimally compresses and the same process on the decompression side should work really well here. But i don't have the time these days. Sigh.
This ability to direct towards a goal effectively defines AGI
No it doesn't, though it may be argued to be a requirement.
That's the point of the previous commenter - that you are making unjustified assertions using an extrapolation of the views of some researchers. Reiterating it with a pointer to why they believe that to be the case doesn't make it more so.
If that's your favoured interpretation, fine, but that's all it is at this point.
Go argue with the scientists who state pretty much what i just said verbatim including full links with proofs in http://prize.hutter1.net/hfaq.htm#ai :)
>One can prove that the better you can compress, the better you can predict; and being able to predict [the environment] well is key for being able to act well. Consider the sequence of 1000 digits "14159...[990 more digits]...01989". If it looks random to you, you can neither compress it nor can you predict the 1001st digit. If you realize that they are the first 1000 digits of π, you can compress the sequence and predict the next digit. While the program computing the digits of π is an example of a one-part self-extracting archive, the impressive Minimum Description Length (MDL) principle is a two-part coding scheme akin to a (parameterized) decompressor plus a compressed archive. If M is a probabilistic model of the data X, then the data can be compressed (to an archive of) length log(1/P(X|M)) via arithmetic coding, where P(X|M) is the probability of X under M. The decompressor must know M, hence has length L(M). One can show that the model M that minimizes the total length L(M)+log(1/P(X|M)) leads to best predictions of future data. For instance, the quality of natural language models is typically judged by its Perplexity, which is equivalent to code length. Finally, sequential decision theory tells you how to exploit such models M for optimal rational actions. Indeed, integrating compression (=prediction) into sequential decision theory (=stochastic planning) can serve as the theoretical foundations of super-intelligence (brief introduction, comprehensive introduction, full treatment with proofs.
Whether or not you agree, a lot of people do. There is a trivial sense in which a perfect compression algorithm is a perfect predictor (if it ever mispredicted anything, that error would make it a sub-optimal compressor for a corpus that included that utterance), and there are plenty of ways to prove that a perfect predictor can be used as an optimal actor (if you ever mispredicted the outcome of an event worse than what might be fundamentally necessary due to limited observations or quantum shenanigans, that would be a sub-optimal prediction and hence you would be a sub-optimal compressor), a.k.a. an AGI.
Where a lot of us get off the fence is when we remove "perfect" from the mix. I don't personally think that performance on a compression task correlates very strongly with what we'd generally consider as intelligence. I suspect good AGIs will function as excellent compression routines, but I don't think optimizing on compression ratio will necessarily be fruitful. And I think it's quite possible that a more powerful AGI could perform worse at compression than a weaker one, for a million reasons.
If you had a perfect lossless compressor (one that could compress anything down to it's fundamental kolmogorov complexity), you would also definitionally have an oracle that could compute any computable function.
Intelligence would be a subset of the capabilities of such an oracle.
No, because instructions to compute any computable function for an utm only requires a tiny set of instructions on how to generate every possible permutation over forever increasing lengths of tape.
It will run forever, and I would agree that in that set there will be an infinite number of functions that when run would be deemed intelligent, but that does not make the computer itself intelligent absent first stumbling on one of those specific programs.
EDIT: Put another way, if the potential to be made to compute in a way we would deem intelligent is itself intelligence, then a lump of random particles is intelligent because it could be rearranged into a brain.
The analogy is not a random lump of particles but all particles in all configurations, of which you are a subset. Is the set of you plus a rock intelligent?
This is a fundamentally flawed argument, because a computer is not in all the states it can execute at once, and naively iterating over the set of possible states might well not have found a single intelligence before the heat death of the universe.
If we were picking me out of an equally large set of object, then I'd argue that no, the set is not meaningfully intelligent, because the odds of picking me would be negligible enough that it'd be unreasonable in the extreme to assign the set any of my characteristics.
Having not read the papers, this sentence strikes me as a bit of a leap. Maybe for very constrained definitions of AGI?