Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

More interesting would be a semantic compressor, one that didn't necessarily return the exact words in the corpus but that would turn a small amount of compressed data into a document with very similar meaning.

E.g., "Because trees tend to have a low albedo, removing forests would tend to increase albedo and thereby cool the planet" is a sentence from the file that would need to be compressed. It should be acceptable for the decompressed text to read, "Eliminating large groups of trees would improve surface reflectivity and provide support for a planetary cooling trend," or to employ any number of similar phrasings.

Any form of AGI arrived at through Hutter's exercise would be more akin to the intelligence shown by an eidetic person when they recite the telephone book, and computers are already good at that.



> Why Not Lossy Compression?

> Although humans cannot compress losslessly, they are very good at lossy compression: remembering that which is most important and discarding the rest. Lossy compression algorithms like JPEG and MP3 mimic the lossy behavior of the human perceptual system by discarding the same information that we do. For example, JPEG codes the color signal of an image at a lower resolution than brightness because the eye is less sensitive to high spatial freqencies in color. But we clearly have a long way to go. We can now compress speech to about 8000 bits per second with reasonably good quality. In theory, we should be able to compress speech to about 20 bits per second by transcribing it to text and using standard text compression programs like zip.

> Humans do poorly at reading text and recalling it verbatim, but do very well at recalling the important ideas and conveying them in different words. It would be a powerful demonstration of AI if a lossy text compressor could do the same thing. But there are two problems with this approach.

> First, just like JPEG and MP3, it would require human judges to subjectively evaluate the quality of the restored data.

> Second, there is much less noise in text than in images and sound, so the savings would be much smaller. If there are 1000 different ways to write a sentence expressing the same idea, then lossy compression would only save log2 1000 = about 10 bits. Even if the effect was large, requiring compressors to code the explicit representation of ideas would still be fair to all competitors.

http://mattmahoney.net/dc/rationale.html


Given enough time and compute, an AGI could enumerate all possible ways to say the sentence in your example, and then store a number representing which sentence is the exact same one as the input.

That number will be much smaller than the text of the sentence.


That number would have to specify one of all possible sentences, though, not just one from the subset in question. I'd assume it would be longer in that case.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: