More interesting would be a semantic compressor, one that didn't necessarily ret...

throwawaywego · on Aug 13, 2019

> Why Not Lossy Compression?

> Although humans cannot compress losslessly, they are very good at lossy compression: remembering that which is most important and discarding the rest. Lossy compression algorithms like JPEG and MP3 mimic the lossy behavior of the human perceptual system by discarding the same information that we do. For example, JPEG codes the color signal of an image at a lower resolution than brightness because the eye is less sensitive to high spatial freqencies in color. But we clearly have a long way to go. We can now compress speech to about 8000 bits per second with reasonably good quality. In theory, we should be able to compress speech to about 20 bits per second by transcribing it to text and using standard text compression programs like zip.

> Humans do poorly at reading text and recalling it verbatim, but do very well at recalling the important ideas and conveying them in different words. It would be a powerful demonstration of AI if a lossy text compressor could do the same thing. But there are two problems with this approach.

> First, just like JPEG and MP3, it would require human judges to subjectively evaluate the quality of the restored data.

> Second, there is much less noise in text than in images and sound, so the savings would be much smaller. If there are 1000 different ways to write a sentence expressing the same idea, then lossy compression would only save log2 1000 = about 10 bits. Even if the effect was large, requiring compressors to code the explicit representation of ideas would still be fair to all competitors.

http://mattmahoney.net/dc/rationale.html

londons_explore · on Aug 12, 2019

Given enough time and compute, an AGI could enumerate all possible ways to say the sentence in your example, and then store a number representing which sentence is the exact same one as the input.

That number will be much smaller than the text of the sentence.

CamperBob2 · on Aug 12, 2019

That number would have to specify one of all possible sentences, though, not just one from the subset in question. I'd assume it would be longer in that case.