I believe almost all LLMs are trained using wikpedia these days. So compressing ...

atiedebee · 2024-12-30T21:29:45 1735594185

There's a reason compression benchmarks often times include the size of the executable when benchmarking compression ratios. Although Matt Mahoney's large text compression benchmark[0] does currently have a transformer model at number 1.

[0] http://www.mattmahoney.net/dc/text.html

Kiro · 2024-12-30T22:23:27 1735597407

Which is also made by the same author as ts_zip (Fabrice Bellard): https://bellard.org/nncp/

cedws · 2024-12-31T17:54:34 1735667674

Is there anything this man can’t do?