Hutter Prize for compressing human knowledge

slashdev · on Sept 13, 2023

I think the mistake here is to require lossless compression.

Humans and LLMs only do lossy compression. I think lossy compression might be more critical to intelligence. The ability to forget, change your synapses or weights, is crucial to being able to adapt to change.

AnotherGoodName · on Sept 13, 2023

Some background before i explain why your suggestion is "not even wrong". If you can predict the next X bits given previous Y bits of observation you don't need to store the next X bits, the decompressor can just use the same prediction algorithm you just used and write out its predictions without correction. This is the same as general purpose AI at a high level. If you can predict the next X bits given previous Y bits of observation you can make reasoned a decision ("Choose option A now so that the next X bits have a certain outcome").

The above is actually the premise of the competition and the reason it exists. What i've said above is in academic papers in detail by Marcus Hutter et al. who runs this competition. That is lossless compression can be a scorecard of prediction which is the same as AGI.

Now saying "they should just make it lossy" misses the point. Do you know how to turn lossy data into lossless? You store some data everytime you're wrong on your prediction. ie. Store data when you have loss to make it lossy. This is arithmetic coding in a nutshell. You can turn any lossy data into lossless data with arithmetic coding. This will require more data the more loss you have. The lossless requirement gives us a scorecard of how well the lossy prediction worked.

If you ask for this to be lossy compression you throw out that scorecard and bring in an entire amount of subjectivity to this which is unwanted.

mellosouls · on Sept 14, 2023

Some background before i explain why your suggestion is "not even wrong".

The phrase you quote is generally used to imply a stupid or unscientific suggestion. Your succeeding comments about what you think AGI is carry a certitude that isn't warranted.

It's good that you are trying to supply knowledge where you think it is lacking, and I understand there are fora where this sort of public school lecturing is amusing but I think your tone is misplaced here.

ltbarcly3 · on Sept 14, 2023

Years long compression challenge with dozens of geniuses participating: exists

Random person on the internet: let me improve this thing I've never heard of by using the one fact I know about compression, there are two kinds

It's absolute hubris and a waste of everyone's time to chime in with low value, trash comments like "they should make it lossy". It's not unreasonable at all to take a snarky tone in response. "not even wrong" absolutely applies here, and they carefully, patiently, and in great detail explained why.

bonoboTP · on Sept 14, 2023

I often feel the same way when discussions pop up here or on other forums, about topics I'm familiar with. Like randos declaring that researchers in deep learning are "obviously doing it wrong" and they should instead do X, where X is like an entire subfield existing for years with a lot of activity, etc.

So I get where you're coming from. But I'd suggest that a place like HN is in fact a place for random people to inject their half-baked takes. It is a just discussion board where lots of the comments will be uninformed or wrong. Take it or leave it. If you want something else, you need to find more niche communities that are - by the nature of it - more difficult to find and less public, including IRL discussion, clubs, conferences etc. But it has its use: we, you and me can jump in any thread and type out what we think after 2 minutes and get some response. But of course someone even more novice might think that we know more than just that 2 minutes consideration, and they learn our junk opinion as if it was the result of long experience. It's unavoidable, since nobody knows who the rest of the commenters are.

Online discussions are incredibly noisy, and often even the people who seem to use the jargon and seem knowledgeable to the outsider can be totally off-base and essentially just imitate how the particular science or field "sounds like". Unfortunately, you only learn this gradually and over a long time. If you learn stuff through forums, Reddit, HN, blogs, substacks etc. it can be very misleading from the first-person experience because you will soak up lots of nonsense as well. Reading actual books and taking real courses is still very much relevant.

HN and co. are more like the cacophony of what the guy on the street thinks. Very noisy, and only supposed to be a small treat over rigorous study. You shouldn't expect to see someone truly breaking new ground in this comment thread. If it disturbs you, you can skip the comments. But trying to "forbid" it, or gatekeep is futile. It's like trying to tell people in a bar not to discuss how bad the soccer team coach is, because they don't really have the relevant expertise. Yeah, sure, but people just wanna chat and throw ideas around. It's on the reader to know not to take it too seriously.

_a_a_a_ · on Sept 15, 2023

ISWYM but the problem isn't really people making suggestions, but the way they make suggestions that grates. If Mr Internet-random-guy wants to introduce the issue of lossy compression then by all means ask why not, but don't say they should.

It comes across as arrogance, probably because it is, then it sucks up plenty of the time of others who do actually know the subject, putting something right.

Even more bloody annoying is when people ask when even the most immediate web search would get the answer. Wikipedia is usually a very good place to start. I guess that for these people, the cost is externalising it to other's wasted time.

Then again, we all take turns at being the stupid one, so am I to complain.

bonoboTP · on Sept 15, 2023

It's inherent in reading comments. And it's also inherent in encountering mere mortals in the real world. And remember how the most annoying and stupid people keep going on about how all the people they meet are stupid and annoying. There's no point in piling on another layer. Close the tab, or comment constructively and charitably. Else you end up with stuff like the badX subreddits (badhistory, badphilosophy) who get their adrenaline/dopamine fix by seeking explanations they see as ignorant/naive/arrogant and sneering at it while self-aggrandizing and feeling like they are in the inner circle who know it all.

The other thing is, you never see all the people who do go to Wikipedia, google or check a book. They won't comment "Hello I'm not commenting now because I went to Wikipedia". They just don't comment.

And Cunningham's Law states "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."

People are more prone to comment out of frustration than other feelings.

RALaBarge · on Sept 14, 2023

You may find the book "Structure of Scientific Revolutions" interesting if you have not read it. The author posits that it isn't people entrenched in the field that will offer breakthrough advancements, but it is instead outsiders looking in.

More often than not, they are aggressively rebutted, which leads to the belief that science progresses one funeral at a time. Perhaps it is you who needs to guard against hubris?

https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Re...

misnome · on Sept 14, 2023

I disagree with you; therefore by your own logic, I am right.

RALaBarge · on Sept 14, 2023

I wish I could claim the logic, but I am a nobody who knows nothing. The author of the book, however, has impressed people for over 70 years with this line of thought and I too agree with him.

Dylan16807 · on Sept 14, 2023

It depends on your goal.

If your goal is compressing human knowledge, then you do want to avoid wasting bits on the details of wording that were random chance.

The problem is inability to objectively judge such a compression, not the mere fact that it won't be bit-perfect.

It is not "not even wrong".

ltbarcly3 · on Sept 14, 2023

You didn't even read the thread though?

All the methods in this competition that are competitive DO use lossy compression.

Dylan16807 · on Sept 14, 2023

I did read the thread. I'm objecting to the idea that stacking lossless corrections onto lossy compression and then measuring the total is a good way to measure what we want to measure here, wrt human knowledge. It may be the best we have, but it's not good.

See my other comment here, that I probably shouldn't have put in a reply to a reply to a dead comment: https://news.ycombinator.com/item?id=37506099

ltbarcly3 · on Sept 15, 2023

Why should we care what you think though? Im not being nasty, but unless you have a reputation in that field you have to give some cogent argument or its just some random possibly-uninformed opinion.

slashdev · on Sept 14, 2023

I’m sorry you see it as a trash comment, but I hope the discussion it provoked was to your liking.

I found it enlightening.

ltbarcly3 · on Sept 14, 2023

There is nothing wrong with jumping in and saying stuff that probably isn't right on an internet forum! That doesn't make it valuable or insightful, it's just how forums work. Your comment was cool with me and you shouldn't feel like you need to change at all.

That said, the response to your comment was insightful and made interesting points. You did in fact kick off a very interesting conversation!

Lanzaa · on Sept 14, 2023

The phrase ["not even wrong"] is generally used to imply a stupid or unscientific suggestion.

An unscientific suggestion is exactly what was offered. Forgive my ignorance, but why was the tone of the message misplaced, and what was the tone?

carapace · on Sept 14, 2023

I feel your tone-policing is misplaced here. When someone is that ignorant or foolish they should be told so so they can hopefully recalibrate or something. There's enough inchoate nonsense on the Internet that I appreciate the efforts to keep discussion on a little higher level here.

p-e-w · on Sept 14, 2023

> There's enough inchoate nonsense on the Internet

I agree 100%. But the top-level comment is not an example of such.

However, the reply in question – and your comment – are certainly examples of the kind of tone-deaf, needlessly aggressive, hostile, confrontational, and borderline malicious posts I wish I could cleanse the Internet of wholesale.

carapace · on Sept 14, 2023

Thanks for the feedback, I disagree.

> tone-deaf, needlessly aggressive, hostile, confrontational, and borderline malicious posts I wish I could cleanse the Internet of wholesale.

I have never said this before, but maybe you're a little too sensitive?

In any event, I feel your characterization of my comment borders on ad hominem and certainly it seems to violate the site guideline to interpret comments charitably.

Good day.

justinclift · on Sept 14, 2023

> When someone is that ignorant or foolish ...

Dude. Calling people "ignorant or foolish" isn't exactly great either. :(

gfody · on Sept 14, 2023

first two comments are the classic HN banter I’m here for, next two have stink on them I do not like

Dylan16807 · on Sept 14, 2023

Banter shouldn't be insulting for no good reason.

ltbarcly3 · on Sept 14, 2023

Banter by definition is teasing, which insulting. That, along with being humorous in that context and not to be taken seriously is what makes it banter rather than any other form of exchange.

Lets just go to the dictionary:

Banter (n): the playful and friendly exchange of teasing remarks.

Teasing (adj): intended to provoke or make fun of someone in a playful way.

Dylan16807 · on Sept 14, 2023

And in specific, "not even wrong" is a total shutdown, not playful.

ltbarcly3 · on Sept 14, 2023

Oh I agree 100%, that wasn't banter, it was a grumpy shutdown.

aeternum · on Sept 14, 2023

Is there any evidence that arithmetic coding works at the level of concepts?

As a thought experiment, suppose Copernicus came up with this Hutter prize idea and declared that he would provide the award to whoever could compress the text of his book on the epicycle-based movement of planets around the sun (De revolutionibus orbium coelestium).

Today we can explain the actual motion to high accuracy with a single sentence that would have been understandable in that age: "A line drawn from the sun to any planet sweeps out equal areas in equal time"

This however is mostly useless in attempting to win the Copernican Hutter prize. Predicting the wording of some random human's choosing (especially at length) is very far removed from ability to predict in general.

AnotherGoodName · on Sept 14, 2023

Arithmetic coding isn't the key thing here. That's just a bit wise algorithm. You predict the next bit is a 1 with 90% certainty? you don't need to store much data with arithmetic coding. That's all arithmetic coding is here.

What your getting at is the 'predictor' that feeds into the arithmetic coder and that's wide open and can work any way you want it to. LLMs absolutely have context which is similar to what your asking and they are good predictors of output given complex input (pass gpt a mathematical series and ask it what comes next. If it's right then it's really helpful in compression as you wouldn't need to store the whole series in full).

kaba0 · on Sept 14, 2023

So you don’t think that writing a wiki article about this could be made smaller by encoding this info in a few logical steps, and adding some metadata on what kind of sentence should follow what? That part about decompressing it is the AI part. Where to place a comma can be added in constant cost between any contender programs.

aeternum · on Sept 14, 2023

Sure my point is more that this adding of commas business dwarfs any real prediction.

Suppose there were a superintelligence that figured out the theory of everything for the universe. It's unclear that would actually help with this task. You could likely easily derive things like gravitaiton, chemistry, etc but the vast majority of your bits would still be used attempting to match the persona and wording of the various wikipedia authors.

This superintelligence would be masked by some LLM that is slightly better at faking human wording.

kaba0 · on Sept 14, 2023

But that comma will have the exact same price between 2 contending lossy compressions. In fact, it is a monotonic function of the difference, so the better your lossy compression is, the better your arithmetic one will be — making you measure the correct thing in an objective way.

It’s like, smart people have spent more than 3 minutes on this problem already.

Dylan16807 · on Sept 15, 2023

> But that comma will have the exact same price between 2 contending lossy compressions.

Why do you think that? Do you have proof of that?

> making you measure the correct thing

If we're trying to measure knowledge, then the exact wording is not part of being correct.

Very often you will have to be less correct to match wikipedia's working. A better lossy encoding of knowledge would have a higher cost to correct it into a perfect match of the source.

kaba0 · on Sept 15, 2023

> Why do you think that? Do you have proof of that?

You want to encode “it’s cloudy, so it’ll rain”. Your lossy, intelligent algorithm comes up with “it is cloudy so it will rain”. You save the diff and apply it. If another, worse algorithm can only produce “it’s cloudy so sunny”, it will have to pay more in the diff, which scales with the number of differences between the produced and original string.

You can be less correct, if that cumulatively produces better results, that’s the beauty of the problem - the last “mile” difference is the same cost for everyone as a factor of the difference.

Dylan16807 · on Sept 15, 2023

But that's just two random examples.

How about "it is cloudy so it will rain" and "it's cloudy, so sunny"? Then since we're looking at the commas for this argument, the second algorithm is paying less for comma correction even though it's much wronger.

You seem to be assuming that a less intelligent algorithm is worse at matching the original text in every way, and I don't think that assumption is warranted.

I'll rephrase the last line from my earlier post: What if wikipedia is using the incorrect word in a lot of locations, and the smart algorithm predicts the correct word? That means the smart algorithm is a better encoding of knowledge, but it gets punished for it.

In that case the last mile cost is higher for a smart algorithm.

And even when the last mile cost is roughly the same, the bigger of a percentage it becomes, the harder it is to measure anything else.

And it shuns any algorithm that's (for example) 5% better at knowledge and 2% worse at the last mile, even though such a result should be a huge win. There are lots of possible ways to encode knowledge that will drag things just a bit away from the original arbitrary wording. So even if you use the same sub-algorithm to do the last mile, it will have to spend more bits. I don't think this is an unlikely scenario.

kaba0 · on Sept 16, 2023

> Then since we're looking at the commas for this argument, the second algorithm is paying less for comma correction even though it's much wronger.

And? It will surely have to be on average more correct than another competitor, otherwise its size will be much larger.

> What if wikipedia is using the incorrect word in a lot of locations,

Then you write s/wrongword/goodword for a few more bytes. It won't be a deciding factor, but to beat trivial compressions you do have to be more smart than plain looking at the data - that's the point.

> And it shuns any algorithm that's (for example) 5% better at knowledge and 2% worse at the last mile

That's not how it works. With all due respect, much smarter people than us has been thinking about it for many years - let's not try to make up why it's wrong after thinking about it badly for 3 minutes.

Dylan16807 · on Sept 24, 2023

> And? It will surely have to be on average more correct than another competitor, otherwise its size will be much larger.

It's possible to have an algorithm that is consistently closer in meaning but also consistently gets commas (or XML) wrong and pays a penalty every time.

Let's say both that algorithm and its competitor are using 80MB at this stage, before fixups.

Which one is more correct?

If you say "the one that needs fewer bytes of fixups is more correct", then that is a valid metric but you're not measuring human knowledge.

A human knowledge metric would say that the first one is a more correct 80MB lossy encoding, regardless of how many bytes it takes to restore the original text.

> Then you write s/wrongword/goodword for a few more bytes. It won't be a deciding factor

You can't just declare it won't be a deciding factor. If different algorithms are good at different things, it might be a deciding factor.

> That's not how it works. With all due respect, much smarter people than us has been thinking about it for many years - let's not try to make up why it's wrong after thinking about it badly for 3 minutes.

Prove it!

Specifically, prove they disagree with what I'm saying.

canjobear · on Sept 14, 2023

You could still come up with a scorecard for lossy compression, for example area under the rate distortion curve. It would be very hard to calculate.

eru · on Sept 14, 2023

Lossless compression _is_ a scorecard for lossy compression. Thanks to the ideas from arithmetic coding.

soVeryTired · on Sept 14, 2023

There’s no way I could reproduce a calculus textbook verbatim, but I can probably prove all the important theorems in it.

Even then, given half of any sentence in the book, I don’t rate my chances of reproducing the next half. That’s more a question of knowing the author’s style than knowing calculus itself.

Does arithmetic coding capture all of that?

kaba0 · on Sept 14, 2023

Let’s say you come up with the exact same text given an initial seed (of you ate this and that that day). Then it really is just the arithmetic function of the difference, as it is deterministic.

Now, who will get closer to the algorithm with that added arithmetic coding? You, knowing the proofs, or a random guy that doesn’t even speak the language? Does it then measure intelligence all else being equals!

QuadmasterXLII · on Sept 14, 2023

gregw134 · on Sept 14, 2023

Or just require the results to be 99.99 percent accurate (within some edit distance)

AnotherGoodName · on Sept 14, 2023

If it's 99.99% accurate arithmetic coding would have next to no data stored.

Arithmetic coding is optimal in turning probabilistic data into lossless data. There's provably no way to do it more efficiently than arithmetic coding. The data it needs for corrections is smaller the better the predictions are.

So given this why even dwell on ways that add any form of subjectivity. Arithmetic coding is there. It's a simple algorithm.

nathanfig · on Sept 14, 2023

"... lossless compression can be a scorecard of prediction which is the same as AGI."

Having not read the papers, this sentence strikes me as a bit of a leap. Maybe for very constrained definitions of AGI?

AnotherGoodName · on Sept 14, 2023

There's a section in the link above "Further Recommended Technical Reading relevant to the Compression=AI Paradigm" and they define it in a reasonably precise mathematical way. It's well accepted at this point. If you can take input, predict what will happen given some options you can direct towards a certain goal. This ability to direct towards a goal effectively defines AGI. "Make paperclips" and the AI observes the world, what decisions needed to be made to optimize for output paperclips and then starts taking decisions to output paperclips is essentially what we mean by AGI and prediction is a piece of this.

I have no stake in this btw, I've just had a crack at the above challenge in my younger days. I failed but i want to get back into it. In theory a small LLM model without any existing training data (for size) that trains itself on the input as it passes predictions to an arithmetic coder that optimally compresses and the same process on the decompression side should work really well here. But i don't have the time these days. Sigh.

mellosouls · on Sept 14, 2023

This ability to direct towards a goal effectively defines AGI

No it doesn't, though it may be argued to be a requirement.

That's the point of the previous commenter - that you are making unjustified assertions using an extrapolation of the views of some researchers. Reiterating it with a pointer to why they believe that to be the case doesn't make it more so.

If that's your favoured interpretation, fine, but that's all it is at this point.

AnotherGoodName · on Sept 14, 2023

Hey don't pin this on me it's not my assertion.

Go argue with the scientists who state pretty much what i just said verbatim including full links with proofs in http://prize.hutter1.net/hfaq.htm#ai :)

>One can prove that the better you can compress, the better you can predict; and being able to predict [the environment] well is key for being able to act well. Consider the sequence of 1000 digits "14159...[990 more digits]...01989". If it looks random to you, you can neither compress it nor can you predict the 1001st digit. If you realize that they are the first 1000 digits of π, you can compress the sequence and predict the next digit. While the program computing the digits of π is an example of a one-part self-extracting archive, the impressive Minimum Description Length (MDL) principle is a two-part coding scheme akin to a (parameterized) decompressor plus a compressed archive. If M is a probabilistic model of the data X, then the data can be compressed (to an archive of) length log(1/P(X|M)) via arithmetic coding, where P(X|M) is the probability of X under M. The decompressor must know M, hence has length L(M). One can show that the model M that minimizes the total length L(M)+log(1/P(X|M)) leads to best predictions of future data. For instance, the quality of natural language models is typically judged by its Perplexity, which is equivalent to code length. Finally, sequential decision theory tells you how to exploit such models M for optimal rational actions. Indeed, integrating compression (=prediction) into sequential decision theory (=stochastic planning) can serve as the theoretical foundations of super-intelligence (brief introduction, comprehensive introduction, full treatment with proofs.

mellosouls · on Sept 14, 2023

Hey don't pin this on me it's not my assertion.

But it is your assertion, wherever you've picked up the idea from.

...which is the same as AGI..

...effectively defines AGI...

No it isn't, and no it doesn't. Your language is too strong in its claims.

cornel_io · on Sept 14, 2023

Whether or not you agree, a lot of people do. There is a trivial sense in which a perfect compression algorithm is a perfect predictor (if it ever mispredicted anything, that error would make it a sub-optimal compressor for a corpus that included that utterance), and there are plenty of ways to prove that a perfect predictor can be used as an optimal actor (if you ever mispredicted the outcome of an event worse than what might be fundamentally necessary due to limited observations or quantum shenanigans, that would be a sub-optimal prediction and hence you would be a sub-optimal compressor), a.k.a. an AGI.

Where a lot of us get off the fence is when we remove "perfect" from the mix. I don't personally think that performance on a compression task correlates very strongly with what we'd generally consider as intelligence. I suspect good AGIs will function as excellent compression routines, but I don't think optimizing on compression ratio will necessarily be fruitful. And I think it's quite possible that a more powerful AGI could perform worse at compression than a weaker one, for a million reasons.

AbrahamParangi · on Sept 14, 2023

If you had a perfect lossless compressor (one that could compress anything down to it's fundamental kolmogorov complexity), you would also definitionally have an oracle that could compute any computable function.

Intelligence would be a subset of the capabilities of such an oracle.

vidarh · on Sept 14, 2023

A universal turing machine can compute any computable function. That doesn't make it intelligent, because it won't without direction.

AbrahamParangi · on Sept 14, 2023

Then expand your target to a universal turing machine and instructions to compute any computable function. Do you consider it intelligent then?

vidarh · on Sept 14, 2023

No, because instructions to compute any computable function for an utm only requires a tiny set of instructions on how to generate every possible permutation over forever increasing lengths of tape.

It will run forever, and I would agree that in that set there will be an infinite number of functions that when run would be deemed intelligent, but that does not make the computer itself intelligent absent first stumbling on one of those specific programs.

EDIT: Put another way, if the potential to be made to compute in a way we would deem intelligent is itself intelligence, then a lump of random particles is intelligent because it could be rearranged into a brain.

AbrahamParangi · on Sept 14, 2023

The analogy is not a random lump of particles but all particles in all configurations, of which you are a subset. Is the set of you plus a rock intelligent?

vidarh · on Sept 14, 2023

This is a fundamentally flawed argument, because a computer is not in all the states it can execute at once, and naively iterating over the set of possible states might well not have found a single intelligence before the heat death of the universe.

If we were picking me out of an equally large set of object, then I'd argue that no, the set is not meaningfully intelligent, because the odds of picking me would be negligible enough that it'd be unreasonable in the extreme to assign the set any of my characteristics.

version_five · on Sept 13, 2023

[flagged]

AnotherGoodName · on Sept 13, 2023

Sorry but i hope my long-winded explanation explains it.

In computer science we have a way to score how good lossy data is. That way is to make it lossless and look at how much data the arithmetic coder needed to correct it from lossy to lossless.

This is a mathematically perfect way to judge (you can't correct lossy data any more efficiently than an arithmetic coder). All the entries here do in fact make probabilistic predictions on the data and they do all use arithmetic coding. So the suggestion misses a key point of CS involved here. I don't mean to be rude about it but the idea does need correcting.

Dylan16807 · on Sept 14, 2023

> This is a mathematically perfect way to judge

Only if you're using a very particular and honestly circular-sounding definition of "good".

Some deviations are more important than others, even if you're looking at deviations that take the same amount of data to correct.

Think about film grain. Some codecs can characterize it, remove it when compressing, and then synthesize new visually matching grain when decompressing.

Let's say it takes a billion bytes to turn the lossy version back into the lossless version.

The version with synthetic film grain still needs a billion bytes or maybe even slightly more bytes, even if the synthetic grain is 95% as good as real grain. The cost to turn it lossless is the wrong metric.

joshxyz · on Sept 13, 2023

Your comment is far more worse, lol. His comment actually supports gp's suggestion, just in a different angle.

It looks like LLM's are like compression algorithms with strengths and weaknesses in different things.

Losslessness doesnt always equate to usefulness. But yea, maybe a different competition for this.

thelastparadise · on Sept 13, 2023

Good lossy compression can be used to achieve lossless compression.

(information theory)

The more accurate the lossy compression is, the smaller the difference between the actual data (lossless) and the approximation. The smaller the difference, the fewer bits required to restore the original data.

So a naive approach is use the LLM to approximate the text (this would need to be deterministic --zero temp with a preset seed), then subtract that from the original data. Store this diff then add back to restore the original data bit-for-bit.

hinkley · on Sept 13, 2023

In psychoacoustics, one of the most important aspects of how lossy compression works is that they throw away sounds that a human can't even hear, because it's too short or subtle to be noticed. This is the difference between data and knowledge. Someone with perfect pitch and a good memory knows exactly what Don't Stop Me Now by Queen sounds like, and it's a lot smaller than digitizing an LP record in mint condition.

This person can cover that song, and everyone will be happy, because they have reproduced everything that makes that song what it is. If anything we might be disappointed because it's a verbatim reproduction (for some reason we prefer cover songs to introduce their own flavor).

If I ask you to play Don't Stop Me Now and you sound like alcoholic Karaoke, you haven't satisfied the request. You've lost. Actually we've all lost, please stop making that sound, and never do that again.

kaba0 · on Sept 14, 2023

But that’s the good thing about this competition: it is fair to everyone. The perfect pitch version can just use a tiny amount of additional data to decompress to the same thing, while the alcoholic will have to correct for loads of differences, making that version be much larger.

This difference is the same monotonically increasing function in both cases so you can basically don’t have to care about it - you can fairly compare the lossy versions as well, you will get the same amount.

So the more advanced version wins, and the competition remains fair and non-ambiguous (otherwise, is perfect pitch A or perfect pitch B had the better cover?)

uoaei · on Sept 14, 2023

Exactly right. There would be no confusion if the Hutter prize was a competition for compressing human data.

This is a parallel issue to the ones in conversations around understanding. There's a massive gulf between being able to build an ontology about a thing and having an understanding of it: the former requires only a system of categorizing and relating elements, and has nothing to do with correctness per se; the latter is an effective (note: lossy!) compression of causal factors which generate the categories and relations. It's the difference between "Mercury, a star that doesn't twinkle, orbits Earth, the only planet in existence, and we have inferred curves that predict why it goes backwards in its orbit sometimes" and "Mercury and Earth, planets, both orbit the Sun, a star, and all stable orbits are elliptical".

kaba0 · on Sept 14, 2023

How is the sum total of wikipedia not human data?

uoaei · on Sept 14, 2023

I think Wikipedia is human data. But to say it's human knowledge (that is, that data is synonymous with knowledge in any context) is actually a pretty hardline epistemological stance to take, one that needs to be carefully examined and justified.

soVeryTired · on Sept 14, 2023

Completely agree, and I think this is where a lot of the confusion in the thread is coming from. The prize claims to compress “human knowledge” but it’s actually a fairly rigid natural language text compression challenge.

I’d conjecture that the algorithms that do well (as measured by compression ratio) on this particular corpus will also do well on corpuses that contain minimal ‘knowledge’.

Compression algorithms that do well might encode a lot of data about syntax, word triple frequencies, and Wikipedia editorial style, but I really doubt they’ll encode much “knowledge”.

kaba0 · on Sept 14, 2023

It is not the sum-total of human knowledge surely. But that it is part of that knowledge is not a controversial statement in my opinion.

uoaei · on Sept 14, 2023

I didn't say it's the sum total anywhere nor did I imply that in any way. I also didn't say it doesn't represent some aspect of human knowledge.

In set notation, what I am saying in above comments is

    ({data} ∪ {knowledge}) \ ({data} ∩ {knowledge}) ≠ ∅

while you are simply assuming (without a good reason) that I am saying

    {data} ∩ {knowledge} = ∅

If you look up the chain with the example of singing a Queen song, maybe that will help you better understand the angle that that commenter and I share about the differences between data and knowledge.

Dylan16807 · on Sept 14, 2023

One of the big problems in this use case is that a dump of wikipedia contains both knowledge and arbitrary noise. And lossless compression has to preserve every single bit of both. It's hard to tease out "good lossy compression" from the mess, because better and better lossy compression doesn't get you arbitrarily close to the original, it only gets you somewhat close.

Xcelerate · on Sept 13, 2023

Who downvoted you? This is correct; the better the lossy compression, the better the lossless compression as well.

CamperBob2 · on Sept 14, 2023

That's true until the lossy compression alters the data in a way that actually makes it harder to represent. As an example, a bitstream that 'corrects' a decompressed .MP3 file to match the original raw waveform data would be almost as difficult to compress as the raw file itself would be.

It wouldn't be a matter of simply computing and compressing deltas at each sample, because frequency-domain compression moves the bits around.

modeless · on Sept 14, 2023

The problem with lossy compression in a competition is that in order to compare different approaches to lossy compression you have to define a metric for the quality of different lossy reconstructions of the data. This is practically impossible to do in an unbiased way. As soon as you define a quality metric then the competition becomes cheating the metric instead of useful compression.

So how do you define a metric that can't be cheated? You add an arithmetic coder after your lossy compressor, which turns it into a lossless compressor, and the size of the losslessly compressed data is your metric for the quality of your lossy compressor. It's the only metric that definitively can't be cheated.

lucb1e · on Sept 13, 2023

Feel free to start your own competition where people are awarded money for a system that correctly answers some set of questions, divided by the amount of storage that system needs.

Sounds like a tough competition to run objectively, not that that makes it less worth doing, but I can see why the parameters were chosen as they were in 2006.

jharohit · on Sept 14, 2023

Ted Chiang actually explores this concept in his article on ChatGPT and other LLMs https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

vintermann · on Sept 14, 2023

Lossy compression is lossless over the information you decided to care about.

You may decide that the sound that's outside the human range of hearing isn't interesting. Or that the colours in your camera sensor that's just thermal noise can be safely ignored. Then you can throw it away and do lossy compression.

What you care about, an algorithm can't answer for you. Sometimes you may want to keep the inaudible sounds in a recording to better understand the audible ones. Sometimes you may be glad you kept the noise in a picture as it allowed you to later identify the model of camera.

There's no context-free answer to what matters. There's no subject-free answer to what matters, there's always a "who" it matters to.

mrcgnc · on Sept 14, 2023

True, but then I would argue (as a layman) that AGI would need to be able to guess what matters in a given context to really be AGI.

vintermann · on Sept 14, 2023

Matters to who?

To me? In that case, it sometimes does, sometimes doesn't, depending on how lazy I've been in constraining to the algorithm what I might want.

To itself? Then who can say if it isn't doing that already?

hnfong · on Sept 14, 2023

See a related benchmark: http://www.mattmahoney.net/dc/text.html

In particular, Fabrice Bellard is leading with a transformer model for “enwik9” that which is 10 times larger than the one used in Hutter. It’s doing quite well with enwik8 too, but perhaps the issue is that the “economy of scale” with on the fly training hasn’t caught up in the smaller benchmark yet.

Transformers are definitely lossy, but afaict Bellard’s entry uses the probabilities generated by the model to create an encoding that used fewer bits

YetAnotherNick · on Sept 16, 2023

Lossless text compression is basically something like arithmetic coding[1] on top of probability. In fact the loss metric that LLM directly optimize for is bits required for lossless compression. Loss plotted in papers is generally nits/token.

The only reason why LLM are not in the leaderboard because it is not a task for compressing human knowledge. It's a task for compressing wikipedia which is tiny in comparison, and the model size plays a big role.

[1]: https://en.wikipedia.org/wiki/Arithmetic_coding

Geee · on Sept 14, 2023

The compression should be semantically lossless, but lexicographically lossy. It should find ways to say the same thing in less characters.

I.e "Sky is blue. Ocean is blue." -> "Sky and ocean are blue."

mik1998 · on Sept 13, 2023

Lossy text compression has little utility.

JumpCrisscross · on Sept 13, 2023

> Lossy text compression has little utility

You're describing every book you've ever read and learned from.

dekhn · on Sept 13, 2023

Ah, see but what they really meant to say was "Lossy text compression has a little utility" but due to lossy compression the meaning changed to be the opposite, thus proving the author's point.

mik1998 · on Sept 13, 2023

No, not at all. I don't read a book to regurgitate the text in an imperfect form later. I read it to learn about the ideas and thought. The syntax itself (ie. text) is not that important.

_lqaf · on Sept 14, 2023

> The syntax itself (ie. text) is not that important

So, you're preferentially discarding information you consider extraneous to your application, distilling it to a smaller representation that retains what you consider important about it.

JumpCrisscross · on Sept 14, 2023

Most image compression smooths skies. Because most people don’t want a regurgitated sky. If you’re an astronomer, on the other hand, you’ll compress away the trees. These are all lossy retrieval (and, inherent to compression, transformation) functions.

barrysteve · on Sept 14, 2023

Good books aren't lossy.

JumpCrisscross · on Sept 14, 2023

> Good books aren't lossy

As someone who recently began rereading books, I heartily agree. The lossless original has value. That doesn’t mean the lossy imprint of that book in my mind “has little utility” or value.

stronglikedan · on Sept 14, 2023

Blinkist would like a word.

heavyset_go · on Sept 13, 2023

Oral tradition has had great utility for people for millennia, and it's very much lossy compression.

nomel · on Sept 14, 2023

Depends on the use of the text. If it's to transfer ideas, then fairly easy to rewrite things so they're more concise. For example, brevity can be used without hurting comprehension.

riversflow · on Sept 14, 2023

/s?

lol

sytelus · on Sept 13, 2023

Humans can do lossy or lossless. There are plenty of people who can recite entire Bible or Koran flawlessly.

TheRealPomax · on Sept 13, 2023

This is more the equivalent of asking humans to create an exact copy of the text, typesetting and all, including the publishing information, page numbers, and exact linebreaks. Not just recite the text, which would be a lossy encoding of the original.

Humans are terrible at lossless encoding of information, it's what we invented machines for =D

ltbarcly3 · on Sept 14, 2023

No it is not more like that. This isn't compressing images, just the sequence of characters (or words). It is literally EXACTLY the same task as just memorizing word for word (plus punctuation), which humans are very good at, probably because our brains use a lot of strategies akin to what is used in this challenge.

version_five · on Sept 13, 2023

How do you know it's compressed? Someone who can recite one of these probably knows interpretations, back stories, related information that help them memorize it. The "storage" required to the recital may be larger than the text itself, it we can even think in those terms.

Supply5411 · on Sept 13, 2023

And there are humans that can jump 8ft in the air. Doesn't mean it's correct to say that "humans can jump 8ft in the air." Very few people are regurgitating verbatim information.

anonylizard · on Sept 13, 2023

Many can recite the Koran flawlessly, its short and heavily encouraged in education through rote repetition.

Much, much fewer can recite the bible, its many times longer.

LLMs can also recite the bible and Koran flawlessly, given how frequent the text appears in their training material.

RugnirViking · on Sept 14, 2023

> LLMs can also recite the bible

Is this something you've seen demonstrated? I have no doubt they would be able to recite a lot of sections of it, but there are many sections that are less often quoted, and its a long frickin book. If its just a hunch you have I might give it a go and compare if thats okay I think it would be interesting to interpret what given LLM knows and might miss

_0ffh · on Sept 14, 2023

Now you've given me the idea of coaxing new bible verses out of an LLM. Maybe we can even get one to confabulate a complete additional book! =)

wood_spirit · on Sept 14, 2023

That’s a strong assertion! Are there prompts that cause llms to regurgitate the bible to demonstrate this?

I’m thinking that the pigeon hole principle says that the current models are orders of magnitude too small to do this.

kadoban · on Sept 13, 2023

That's true, but it seems unlikely that that's a particularly important part of intelligence. The vast majority of people do _not_ do that type of memorization, are they still intelligent?

mjan22640 · on Sept 16, 2023

math is a lossless compression

version_five · on Sept 13, 2023

Yeah it makes no sense to say it's inspired by intelligence and then require lossless which is definitionally rote work and not intelligent.

whimsicalism · on Sept 13, 2023

Not true, a smart model could be really good at lossy compression and then you only have to store a small delta to make it lossless.

AnotherGoodName · on Sept 13, 2023

That's literally arithmetic coding which is used by all winning entries in the above so far.

kaba0 · on Sept 14, 2023

Which is the exact same function of the difference between any competitor, so you can actually fairly compare them on the lossy part.. like, people have been thinking about this problem much more than the 3 minutes HNers in this comment section.

whimsicalism · on Sept 14, 2023

ClassyJacket · on Sept 13, 2023

I'm no mathematician but I don't believe this is true. Lossless information encoding requires all the original information to be present.

AnotherGoodName · on Sept 13, 2023

Arithmetic coding allows you to make a prediction and only provide bits for correction.

Have the de-compressor predict the next data based on the outcome so far (a statistical prediction of next data will be lossy as it won't always be correct). If the prediction is correct you need to spend very little to confirm that. If it's incorrect you'll need to spend data to correct it. Arithmetic coding is the best way to make this work.

It's also been used by all winning entries of the Hutter prize so far.

ClassyJacket · on Sept 13, 2023

Alright I guess I was wrong.

slashdev · on Sept 14, 2023

Technically you’re correct. If you can rebuild the original losslessly then all the original information is present, just in a different configuration.

glitchc · on Sept 13, 2023

Or at least reproducible. It could still be compressed.

vladf · on Sept 13, 2023

omoikane · on Sept 13, 2023

500000 EUR is the prize pool. Each winner has to gain at least 1% improvement over previous record to claim a prize that is proportional to the improvement. Getting the full 500000 EUR prize requires an 100% improvement (i.e. compressing 1GB to zero bytes).

phobotics · on Sept 13, 2023

Does it or does it just require 1% improvement over the last winner? As opposed to a static additional 1% improvement vs the initial best “score”.

omoikane · on Sept 13, 2023

It's 1% over the last winner. The latest winner has a total size of 114156155, compared to previous winner of 115352938. The payout was

   500000 * (1 - 114156155 / 115352938) = 5187

(see table near "Baseline Enwik9 and Previous Records Enwik8")

bigyikes · on Sept 13, 2023

Probably if you succeed at this, 500,000 will be worthless to you

sytelus · on Sept 13, 2023

Why? How does this improvement translates to more financial gains?

munchler · on Sept 14, 2023

The idea is that sufficiently powerful compression is indistinguishable from artificial intelligence, which can be used to make even more money.

Eduard · on Sept 13, 2023

because with that knowledge, you will be able to decompress 0 dollar to infinite dollars which the storage mafia will pay you for not publishing your breakthrough in making them obsolete.

aleph_minus_one · on Sept 13, 2023

Let's say you really find a new, much much better compression algorithm for texts. I do believe that soon new applications will appear that will make use of the whole storage space that existing data storage devices can offer.

ephbit · on Sept 14, 2023

Are you possibly hinting at the endless stream of "new generation iWasteware camera, now with 2 gigapixel sensor" in combination with camera apps that put a (high) limit on the lowest resolution one can choose?

aleph_minus_one · on Sept 14, 2023

I did not have exactly this application in mind, but I was aware that this kind of waste does exist for many commercial applications when I was writing my reply.

airstrike · on Sept 14, 2023

sorry, who exactly is the storage mafia?

Al-Khwarizmi · on Sept 14, 2023

Well, if you achieve this, you'll basically have proven that something (a bunch of information) equals nothing (no information). So 1=0.

Once you have that, becoming rich is trivial. Multiplying both sides of the equation by one trillion, 1 trillion = 0. So open a bank account with $0, now you have one trillion. Easy.

A funnier (although grimmer) way: let p be the world's population. Multiplying both sides of the equation by (p-1) you get p-1=0. Thus, p=1. If you assume that you exist (which is a reasonable assumption, following Descartes's reasoning) you now own all the wealth in the world.

falloutx · on Sept 14, 2023

I think personally I can compress $1b into 0 within a week, but $1tr would be a real challenge. Another thing I should point out that I don't support compressing people (for now).

hegzploit · on Sept 14, 2023

because, then you will have beat pied piper in the market.

quickthrower2 · on Sept 14, 2023

Being that smart translates to more financial gains.

abroadwin · on Sept 14, 2023

Form is emptiness, emptiness is form. 1GB == 0B. Full prize money delivered upon successfully achieving enlightenment.

arapacana · on Sept 14, 2023

holographic storage, if we can harness the zero point field the way Karl Pribram postulated cognitive memory works (i.e. using the Void for storage) may actually get us to 0 "physical" B ;)

Alifatisk · on Sept 14, 2023

> compressing 1GB to zero bytes

Is this even possible? Don’t we need atleast some data to decompress from?

jfengel · on Sept 14, 2023

Not really 0, since any exe file will have some minimum length, and that's defined to be part of the calculation.

Conceptually you could cram it into some dozens of bytes (trivial initial state of the universe + basic physics + iteration). In practice, of course, that's ridiculous.

AnimalMuppet · on Sept 14, 2023

Depends on your definitions. I could decompress 1GB from zero bytes, with a program that is just over 1GB in length...

albertzeyer · on Sept 14, 2023

It's very well defined on the homepage:

> Create a Linux or Windows compressor comp.exe of size S1 that compresses enwik9 to archive.exe of size S2 such that S:=S1+S2 < L

So obviously your suggestion does not work.

AnimalMuppet · on Sept 14, 2023

Well, all right, I could use conventional compression such that S2 is zero, by making S1 what would have been S1 + S2. That still leaves the "compressed image" having zero bytes, by hiding the actual compressed image in the executable.

Note well: I am not claiming that there is any reasonable reason to do this. It's just a way to say "well acksually..."

Alifatisk · on Sept 15, 2023

So your way around is by baking the data into the executable instead?

How good of a ”solution” is that in reality? Like what if you were tasked to decompress multiple different files? Would you bake does into the executable aswell?

Personally, I would count this way around as invalid.

Dylan16807 · on Sept 14, 2023

Notice how you had to change the wording in this comment to make the scenario work. If the problem statement is "compressing 1GB to zero bytes" or "decompress 1GB from zero bytes" then that trick is easily called out as invalid.

lainga · on Sept 13, 2023

Ah... I had professors who graded like that

TheAlchemist · on Sept 13, 2023

I mean, come on man. For some reason, the nerd in me sees this and immediately adds it on my 'I really need to do this' list.

Just memories of old times doing some similar (albeit less challenging probably) competitions on TopCoder almost a decade ago, and also the curiosity to see how I would manage it know, with experience. Given that the current scores are also very far from what they estimate the lower bound to be, this is really interesting ! The prize is however very misleading - per their own FAQ - the total possible payout is ~223k euros.

Definitely not thanking you for the hours I will put into this !

RugnirViking · on Sept 14, 2023

I think the total prize pool can be won with repeated 1% improvements by different people but never by a single person no?

demandingturtle · on Sept 13, 2023

I have an idea if you want to code it. You know how we can drop the vowels from sentences and still understand the sentence? What if we do that on the first stage? May not work for every case so have to identify. Worse comes to worst, use full word. Probably not saving much though. Worth a try.

RugnirViking · on Sept 14, 2023

You got a lot of flak for what is clearly a take from someone that isn't versed in compression techniques. But as one might to a student; you're on the right track! This idea is similar in form to "arithmetic coding" which is what people are using to chip away at this. Namely, finding smaller encodings which can be used to predict common parts (maybe a recognisable word, more likely a sequence of bits or characters) of the full encoding, then cycling through storing "hints" for each part it would get wrong until it can predict the exact desired output

bob1029 · on Sept 14, 2023

I think you have it backwards, but I like the direction of thinking. There is redundancy in the language itself.

> However, vowel-only sentences were always significantly more intelligible than consonant-only sentences, usually by a ratio of 2:1 across groups. In contrast to written English or words spoken in isolation, these results demonstrated that for spoken sentences, vowels carry more information about sentence intelligibility than consonants for both young normal-hearing and elderly hearing-impaired listeners.

https://pubs.aip.org/asa/jasa/article-abstract/122/4/2365/98...

yosito · on Sept 14, 2023

What you're describing is a form of lossy compression. Yes, it can compress the file, but you're losing some information such that there's no way to convert it back to it's original form without external information. And, as you noted: it may not work for every case, which means some information would be permanently lost.

falloutx · on Sept 14, 2023

By your logic, the sign language which mostly communicates via symbols and its mostly 1-2 symbols per word, no spaces, will be the best compression.

saulpw · on Sept 14, 2023

I bet you think of yourself as an idea guy.

freilanzer · on Sept 14, 2023

> I bet you think of yourself as an idea guy.

What does this even mean?

saulpw · on Sept 14, 2023

It means he wants to sit back, write a lazy comment, and have someone else do all the work ("code it up", hah). Toss his brilliant ideas that are "worth a try" over the fence and move on to the next brilliant idea. Without any research or due diligence or even consideration for the comparative amount of effort it takes to have an idea, vs executing on it. Ideas are a dime a thousand.

This is the same kind of person who's got a great business idea, they just need you to write up an AirBnb clone in a couple of weeks, and they'll give you 5% of the company.

Dave_Rosenthal · on Sept 14, 2023

While I know it wouldn't qualify for the prize due to the size of the model, I'm curious how well a big modern LLM can compress the prize file (enwiki9) compared to the current record. (I guess ideally it would be an LLM not trained on wikipedia as well, to minimize memorizing).

On that thought, with a modern LLM's weights dwarfing the enwiki9 file used by the prize, it feels like this prize was setup with the right idea to advance AGI, but with the problem several orders of magnitude to small.

lIIllIIllIIllII · on Sept 14, 2023

The site itself says that this is the general idea, proposing that intelligence can actually be defined as the ability to compress information as much as possible and.. I guess either just rehydrate it or also derive inferences from it.

So basically like enthalpy

Interesting - if you play the universe in reverse (reversing entropy) you wind up at the Big Bang, which is the final boss of compressing everything, into a singularity.

eru · on Sept 14, 2023

> Interesting - if you play the universe in reverse (reversing entropy) you wind up at the Big Bang, which is the final boss of compressing everything, into a singularity.

Only if you believe that running the universe forward did not add any extra information. Ie everything is deterministic, and no randomness occurs.

(That's actually a reasonable stance to take. Quantum mechanics is probabilistic in the Copenhagen interpretation, but completely deterministic in the Many World interpretation.)

vjerancrnjak · on Sept 14, 2023

A non-local formulation of QM is deterministic. [0] An addition to the "many world" interpretation. Still one universe but the underlying reality is fully connected (to go forward you need to take the effects of everything).

0: https://en.wikipedia.org/wiki/De_Broglie–Bohm_theory

eru · on Sept 14, 2023

Yes. De Broglie Bohm theory is another example of an interpretation that's deterministic.

In practice, because of deterministic chaos it doesn't matter too much whether your lowest level theory is deterministic. (It only matters philosophically.)

kaba0 · on Sept 14, 2023

It is more than likely that most physical reactions are not deterministic, and in that case the Big Bang definitely didn’t encode anything about all the state we have now. (Also, simply by Occam’s razor it seems much much more likely)

DaiPlusPlus · on Sept 14, 2023

> The site itself says that this is the general idea, proposing that intelligence can actually be defined as the ability to compress information as much as possible

I don't understand how anyone can define "intelligence" like that...

GolDDranks · on Sept 14, 2023

It's not an unreasonable definition, if you are aware of Kolmogorov complexity and Solomonoff induction. Intelligence is intimately connected to the ability to predict and model things, and it turns out that data compression is _also_ connected to the ability of predict and model things.

lIIllIIllIIllII · on Sept 14, 2023

It makes sense intuitively right?

If I tell you that this week it's raining on Monday, raining on Tuesday, raining on Wednesday, etc..

Compressing that information into "it's raining every day this week" requires creating an abstraction around it. Finding a pattern. Producing more order from something more chaotic.

DaiPlusPlus · on Sept 14, 2023

> Compressing that information into "it's raining every day this week" requires creating an abstraction around it. Finding a pattern. Producing more order from something more chaotic.

Is there any way to determine if that process of abstraction-finding and pattern-finding will always/actually/sometimes/never result in a more compressed output than simpler approaches though?

GolDDranks · on Sept 17, 2023

A hunch of mine is that's undecidable in the general case. In simple cases like this, there's likely to be a way to prove that yes, this is tho most compressed thing out there.

vjerancrnjak · on Sept 14, 2023

Interestingly, compression increases entropy. So the first description is more "ordered", or less random.

DaiPlusPlus · on Sept 14, 2023

> It's not an unreasonable definition, if you are aware of Kolmogorov complexity and Solomonoff induction

I was already familiar with Kolmogorov Complexity, but not _Solomonoff's theory of inductive inference_ - I skimmed the Wikipedia article just now and I think/hope I get the general gist of it - so thank you for that.

> Intelligence is intimately connected to the ability to predict and model things

I agree with that statement - but I am unsure how this relates to other aspects of intelligence - or even other notions of intelligence entirely. If we're redefining exactly what the word "Intelligence" means aren't we just moving the goalposts?

...so I'm unsure if the word "Intelligence" in your post can/should be read as a layperson would understand it, or if it's a specific, well-defined term in this field? (Like how "complexity" in algorithms does not refer to how convoluted and unmaintainable a program is, which is what a layperson would think of when an SWE person says "this program has high computational complexity" to them).

> and it turns out that data compression is _also_ connected to the ability of predict and model things.

This is where I have difficulty following your argument:

As a preface: my academic understanding of data compression only covers things like algorithms in the LZ family, Entropy coding, and (lossy and lossless) domain-specific compression schemes (like JPEG/DCT, MP3/FLAC, etc). I am aware of very interesting results coming from present-day AI approaches, like using LLMs fed only on compressed data - but these are completely outside my scope of understanding (and LLMs are still very spooky to me).

Does it matter if a scheme is lossless or lossy? In a lossless system, surely a data compression system needs to be deterministic and mechanical? If so, what room is there for "intelligence" in a rigid, statically-defined, system?

Take entropy coding for example - or even simpler probability-based compression schemes like Morse Code (e.g. "E" has a high probability, so it has a very short symbol length). I just can't see the connection from things like entropy-encoding to "modelling": supposing I have a large file (CSV?) of meterological data for a specific region over time - if it's plaintext then I expect I can compress it better using gzip than by using a system that can (somehow!) identify some structure to the weather sensor data (the "model", right?) and then use that as the basis for a better domain-specific compression - but doing this means having to add additional metadata to the source data to describe the structure that the system identified, and then hope that this approach is better than a comparatively "dumb" approach like DEFLATE - and even then, assuming that employing that "model" really does result in smaller compressed output, how is that an example of the system having a general "intelligence"?

GolDDranks · on Sept 18, 2023

Sorry, I didn't get back to this until now.

I was thinking of lossless compression, lossy is another can of worms on top of lossless. Lossless compression works, in principle, by the realization that the probability distribution of the all possible input strings is not flat. There are some strings that are more probable, and strings that are less probable. The reason some strings are more probable is that they result not from random processes, but there is some process or algorithm that generate them. Thus, they have inherent structure to them. A random string would be a string that has no structure detectable by any algorithm smaller than the string itself. If the probability distribution of the input strings is not flat, then we can use entropy coding to describe the common case - non-random data – in less bits than the input.

However, the difference between some specific compression algorithms and a "general compression algorithm" is the assumptions they do. Most compression algorithms don't consider the probability distribution of the "full" set of input strings, but they rather divide the input string into a predictable sized chunks, and consider the distribution of those chunks. This is way simpler, and yields to being able to compress somewhat well, while still having rather "static" distribution (like morse code), or only simple algorithms (like adaptive Huffman coding) to adapt the distribution to the input data.

But if we don't restrict ourselves to the world of "compressing a stream, message by message", but enable "intelligent" compression that is allowed to use any computable means to achieve smaller compressed size and can adapt to the whole context of the input, we can see that "message-by-message" entropy coding is only a subset of what we can do. (And we of course _also_ have that subset in our toolbox.) But the true challenge of the compression now evolves into being able to find and represent approximations to the "full" distribution of the input data. That involves things like identifying structure in weather data! If the input is large enough to start with, the more complex model might be worth it. And if it isn't then we can intelligently just decide to use DEFLATE.

> what room is there for "intelligence" in a rigid, statically-defined, system?

But as we can see from above, surely intelligence that is tasked with compressing stuff, doesn't need to be "rigid"? The intelligence is in the compressor, not in the compressed result. The compressed result might be assembly code that unpacks itself by doing arbitrary computations, and to achieve good results, the generator of that code must be sufficiently intelligent.

Thoreandan · on Sept 14, 2023

There is a different but similar lossless compression benchmark which just had a new front-runner using LLMs, see https://bellard.org

NooneAtAll3 · on Sept 14, 2023

If anyone is interested in lossless compression competition, but is too intimidated by 100MB/1GB size and level of optimization already achieved here - you can try http://golf.horse/ challenges, which include Wordle wordlist, Pokemons and OEIS database

OscarCunningham · on Sept 14, 2023

I don't understand why they sum the size of the compressor with the combined compressed file and decompressor. I think the compressed file and decompressor would make an ungameable challenge on their own. Their FAQ section 'Why is Compressor Length superior to other Regularizations?' is satisfied equally well by the decompressor length.

jcparkyn · on Sept 15, 2023

I had the same question, and I think this FAQ entry answers it fairly well: http://prize.hutter1.net/hfaq.htm#addcomp

OscarCunningham · on Sept 15, 2023

Thanks, that one answers my question.

userbinator · on Sept 14, 2023

It's interesting to see a new winner of different nationality beating the multiple previous records by the same guy, and what appears to be another one who beat him 2 years ago; for some (cultural?) reason, there seems to be a historical association between Soviets and advanced data compression technologies in general, possibly starting with Markov's work in the 19th century.

eru · on Sept 14, 2023

The causation for this might run along the lines of having plenty (in absolute terms) of well educated smart folks but pretty low wages.

Most of the really smart people in Europe and the US for example already have high paying jobs and thus less time for these relatively low paying competitions.

Rexxar · on Sept 14, 2023

Why the formula use the size of the compressor ? Intuitively I would expect the size of the decompressor and compressed data is the only think that is really important to know how much the total information have been compressed.

Klaus23 · on Sept 14, 2023

Otherwise you could just store the whole text in the decompressor.

Another way is to keep the file secret and just output a score, but if you want to use a public file, you have to include the program in the calculated size.

jcparkyn · on Sept 15, 2023

> Otherwise you could just store the whole text in the decompressor.

I don't think GP was asking about the decompressor, but specifically the compressor.

The FAQ has a fairly detailed answer for this though: http://prize.hutter1.net/hfaq.htm#addcomp

Klaus23 · on Sept 15, 2023

Yes, I didn't read the comment carefully and quickly assumed it was the standard question about including the decompressor.

Rexxar · on Sept 15, 2023

Thanks, I didn't see this part of the faq !

Conlectus · on Sept 14, 2023

Perhaps because they include metrics for compression time, which could easily be gamed by doing significant precomputation in the compressor?

dang · on Sept 13, 2023

Related. Others?

Saurabh Kumar's fast-cmix wins €5187 Hutter Prize Award - https://news.ycombinator.com/item?id=36839446 - July 2023 (1 comment)

Hutter Prize Submission 2021a: STARLIT and cmix (2021) - https://news.ycombinator.com/item?id=36745104 - July 2023 (1 comment)

Hutter Prize Entry: Saurabh Kumar's “Fast Cmix” Starts 30 Day Comment Period - https://news.ycombinator.com/item?id=36154813 - June 2023 (5 comments)

Hutter Prize - https://news.ycombinator.com/item?id=33046194 - Oct 2022 (3 comments)

Hutter Prize - https://news.ycombinator.com/item?id=26562212 - March 2021 (48 comments)

500'000€ Prize for Compressing Human Knowledge - https://news.ycombinator.com/item?id=22431251 - Feb 2020 (1 comment)

Hutter Prize expanded by a factor of 10 - https://news.ycombinator.com/item?id=22388359 - Feb 2020 (2 comments)

Hutter Prize: up to 50k € for the best compression algorithm - https://news.ycombinator.com/item?id=21903594 - Dec 2019 (2 comments)

Hutter Prize: Compress a 100MB file to less than the current record of 16 MB - https://news.ycombinator.com/item?id=20669827 - Aug 2019 (101 comments)

New Hutter Prize submission – 8 years since previous winner - https://news.ycombinator.com/item?id=14478373 - June 2017 (1 comment)

Hutter Prize for Compressing Human Knowledge - https://news.ycombinator.com/item?id=7405129 - March 2014 (24 comments)

Build a human-level AI by compressing Wikipedia - https://news.ycombinator.com/item?id=143704 - March 2008 (4 comments)

lucb1e · on Sept 13, 2023

One would think it ought to be possible to automate generating these overviews. Or has it been automated and is it being posted with a non-service account because it's partially but not fully automated?

LeonB · on Sept 14, 2023

Pretty sure Dang has some nice scripts that generate this, and I expect he does a little manual pruning before posting it. He’s very precise in his choices, definitely applies some intuition/judgement. We can’t losslessly compress him — yet!

hn_throwaway_99 · on Sept 13, 2023

This is probably a dumb question, but I recall from (way) back in my college days studying Shannon's source coding theorem that there are calculable limits regarding how much a data stream can be compressed. Has this limit been calculated for the sample file in question? And apologies if this was in the linked article or one of the footnotes.

mgraczyk · on Sept 13, 2023

To establish bounds in this way, you have to start with some claim about the distribution of the input data. In this case, the data is natural human language, so it's difficult or impossible to directly state what distribution the input was drawn from. Even worse, the prize is for compressing a particular text, not a sample from a distribution, so tight bounds are actually not possible to compute.

There is some discussion on the Hutter prize page, under "What is the ultimate compression of enwik9?"

http://prize.hutter1.net/hfaq.htm

jftuga · on Sept 14, 2023

From this article:

More empirically, Shannon's lower estimate suggests that humans might be able to compress enwik9 down to 75MB, and computers some day may do better.

londons_explore · on Sept 14, 2023

Notable that enwiki8/9 isn't really just human text - a good ~half of the data is random xml markup which may not have the same properties as text.

AnotherGoodName · on Sept 14, 2023

The current best compresses it to <15MB because it's not a human which that example is referring to. :)

userbinator · on Sept 14, 2023

That's enwik8. enwik9 is an order of magnitude larger.

AnotherGoodName · on Sept 13, 2023

No. In fact there are limits but unless you've already written a 'perfect compression tool' you can't actually know the limit.

https://en.wikipedia.org/wiki/Kolmogorov_complexity is ultimately what you're looking for btw. Shannon is more about limits in transmission speed given noise but Kolmogorov dealt with the limits of compression which is actually unknowable.

TheRealPomax · on Sept 13, 2023

  Q: Why do you restrict to a single CPU core and exclude GPUs?
  A: The primary intention is to limit compute and memory to some generally available amount in a transparent, easy, fair, and measurable way. 100 hours on one i7 core with 10GB RAM seems to get sufficiently close to this ideal

Sorry, who are these people that don't have a GPU? Even laptops have GPUs. Why would you spend 100 hours on an "i7" (which generation? 4790K or six times faster 13700k?) CPU when you can achieve orders of magnitude better performance on a consumer GPU that literally everyone has access to?

Zandikar · on Sept 13, 2023

> Sorry, who are these people that don't have a GPU

Literally millions of people. Especially circa early 2000's when this started.

> GPU that literally everyone has access to?

They don't. (Money is a barrier to access).

> Why would you spend 100 hours on an "i7"

Because it's what they had/have.

This a competition to move understanding forward, not to see who has the biggest budget.

nomel · on Sept 14, 2023

The level of privilege that some people are completely blind to is just incredible.

modeless · on Sept 14, 2023

Sure, in the early 2000s. But today literally every laptop or desktop computer or phone with a touchscreen has a GPU. Even most feature phones have GPUs, I'd actually be interested to see if you could find a currently manufactured one that doesn't. And they have all been programmable for a very long time now. And almost without exception they have more FLOPS than all of the CPUs in the same machine put together, yes, even the integrated GPUs.

TheRealPomax · on Sept 14, 2023

In which case: https://news.ycombinator.com/item?id=37502782

And 20 years changed A LOT of things, turns out the folks who can't afford a deskop or laptop don't need to, they have a call phone. And even cell phones have GPUs that are more powerful than the CPUs of 20 years ago.

anon____ · on Sept 14, 2023

The first winner didn't even have a computer!

    Q: What if I can (significantly) beat the current record?

    A: In this case, submit your code and win the award and/or copyright your code and/or patent your ideas. You should be able to monetize your invention beyond the HKCP. This happened to the first winner, a Russian/Ukrainian who always had to cycle 8km to a friend to test his code because he did not even have a suitable computer, and who now has a lucrative job at QTR in Canada.

http://prize.hutter1.net/hfaq.htm#fame

tyingq · on Sept 14, 2023

>lucrative job at QTR in Canada

Google failed me on this one. What company/entity is QTR?

anon____ · on Sept 14, 2023

Quantum Technology Recruiting Inc. (https://ca.linkedin.com/company/quantum-technology-recruitin...)

Our guy, Alex Rhatushnyak, is listed as employee.

BTW, this is the first result on DuckDuckGo. (https://duckduckgo.com/?q=qtr+company+canada)

Is it time to switch? ;)

lucb1e · on Sept 13, 2023

Note that the competition close to 20 years old

...though I also had a GPU in 2006, so idk. Then again, you need to define something as reference hardware and it doesn't really matter what it is. Better compression should win out over less-good compression no matter if you run both on a 100-core system or a 1-core system, I think?

TheRealPomax · on Sept 13, 2023

In the category "then update your FAQ, you've have many, many years to do so" =D

(not to change the rules, but to explain why they rules haven't changed, and to clarify the super vague phrasing even back when the 3dfx Voodoo was the pinnacle of graphics. Level playing fields are a worthwhile pursuit)

ozzmotik · on Sept 13, 2023

anecdata: for my entire life, I have neber, until about 3 weeks ago, owned a laptop with any type of gpu other than an integrated graphics card built into the motherboard. for most of my life in fact I have never really had any modern or really serviceable hardware, as I've basically inherited all of my compute devices secondhand never had the money to just go out and buy a decent rig (or to even put one together for the matter, perhaps tbst will elucidate the economic sector which I have basic y always occupied. well, not always, as now that I'm homeless, I somehow went even LOWER! )

though it truly does excite me that I finally have something with some type of GeForce branded x graphics device inside of it, now I can finally run local machine learning tasks and not have to cloud it out

stronglikedan · on Sept 14, 2023

My two years old high end productivity Thinkpad has an Intel Iris XE display adapter, which I would hesitate to call a GPU.

nier · on Sept 13, 2023

Maybe commodity servers with integrated and probably never used graphics capabilities that no one would actually describe as a GPU?

caseyavila · on Sept 13, 2023

I do think it's interesting that recent submissions use nearly the entire 50 hours. I wonder how much better people could do if faster hardware was allowed.