So you don’t think that writing a wiki article about this could be made smaller by encoding this info in a few logical steps, and adding some metadata on what kind of sentence should follow what? That part about decompressing it is the AI part. Where to place a comma can be added in constant cost between any contender programs.
Sure my point is more that this adding of commas business dwarfs any real prediction.
Suppose there were a superintelligence that figured out the theory of everything for the universe. It's unclear that would actually help with this task. You could likely easily derive things like gravitaiton, chemistry, etc but the vast majority of your bits would still be used attempting to match the persona and wording of the various wikipedia authors.
This superintelligence would be masked by some LLM that is slightly better at faking human wording.
But that comma will have the exact same price between 2 contending lossy compressions. In fact, it is a monotonic function of the difference, so the better your lossy compression is, the better your arithmetic one will be — making you measure the correct thing in an objective way.
It’s like, smart people have spent more than 3 minutes on this problem already.
> But that comma will have the exact same price between 2 contending lossy compressions.
Why do you think that? Do you have proof of that?
> making you measure the correct thing
If we're trying to measure knowledge, then the exact wording is not part of being correct.
Very often you will have to be less correct to match wikipedia's working. A better lossy encoding of knowledge would have a higher cost to correct it into a perfect match of the source.
> Why do you think that? Do you have proof of that?
You want to encode “it’s cloudy, so it’ll rain”. Your lossy, intelligent algorithm comes up with “it is cloudy so it will rain”.
You save the diff and apply it. If another, worse algorithm can only produce “it’s cloudy so sunny”, it will have to pay more in the diff, which scales with the number of differences between the produced and original string.
You can be less correct, if that cumulatively produces better results, that’s the beauty of the problem - the last “mile” difference is the same cost for everyone as a factor of the difference.
How about "it is cloudy so it will rain" and "it's cloudy, so sunny"? Then since we're looking at the commas for this argument, the second algorithm is paying less for comma correction even though it's much wronger.
You seem to be assuming that a less intelligent algorithm is worse at matching the original text in every way, and I don't think that assumption is warranted.
I'll rephrase the last line from my earlier post: What if wikipedia is using the incorrect word in a lot of locations, and the smart algorithm predicts the correct word? That means the smart algorithm is a better encoding of knowledge, but it gets punished for it.
In that case the last mile cost is higher for a smart algorithm.
And even when the last mile cost is roughly the same, the bigger of a percentage it becomes, the harder it is to measure anything else.
And it shuns any algorithm that's (for example) 5% better at knowledge and 2% worse at the last mile, even though such a result should be a huge win. There are lots of possible ways to encode knowledge that will drag things just a bit away from the original arbitrary wording. So even if you use the same sub-algorithm to do the last mile, it will have to spend more bits. I don't think this is an unlikely scenario.
> Then since we're looking at the commas for this argument, the second algorithm is paying less for comma correction even though it's much wronger.
And? It will surely have to be on average more correct than another competitor, otherwise its size will be much larger.
> What if wikipedia is using the incorrect word in a lot of locations,
Then you write s/wrongword/goodword for a few more bytes. It won't be a deciding factor, but to beat trivial compressions you do have to be more smart than plain looking at the data - that's the point.
> And it shuns any algorithm that's (for example) 5% better at knowledge and 2% worse at the last mile
That's not how it works. With all due respect, much smarter people than us has been thinking about it for many years - let's not try to make up why it's wrong after thinking about it badly for 3 minutes.
> And? It will surely have to be on average more correct than another competitor, otherwise its size will be much larger.
It's possible to have an algorithm that is consistently closer in meaning but also consistently gets commas (or XML) wrong and pays a penalty every time.
Let's say both that algorithm and its competitor are using 80MB at this stage, before fixups.
Which one is more correct?
If you say "the one that needs fewer bytes of fixups is more correct", then that is a valid metric but you're not measuring human knowledge.
A human knowledge metric would say that the first one is a more correct 80MB lossy encoding, regardless of how many bytes it takes to restore the original text.
> Then you write s/wrongword/goodword for a few more bytes. It won't be a deciding factor
You can't just declare it won't be a deciding factor. If different algorithms are good at different things, it might be a deciding factor.
> That's not how it works. With all due respect, much smarter people than us has been thinking about it for many years - let's not try to make up why it's wrong after thinking about it badly for 3 minutes.
Prove it!
Specifically, prove they disagree with what I'm saying.