A natural question I've pondered from time to time is whether Fabrice Bellard is really a time traveler from a more advanced civilization in the future, sent back in time to show us, mere mortals, what humankind will be capable of in the future.
If this sounds far-fetched, consider that he has created FFMPEG, QEMU, LibBF, SoftFP, BPG, TinyEMU, a software implementation of 4G/LTE, a PC emulator in Javascript, the TCC compiler, TinyGL, LZEXE, and a tiny program for computing the biggest known prime number.
And that's just a partial list of his successful projects, which now of course also include software for more efficient lossless compression with deep LSTM and Transformer neural networks.
Any of these projects, on its own, would be considered a notable achievement for an ordinary human being.
Fabrice Bellard deserves some kind of superhuman lifetime achievement award.
And this is why Fabrice Bellard is so often cited as an example of a 10x programmer: his relatively simple LSTM-based neural net compressor is 10x faster than one of the more complex state of the art algorithms while delivering comparable compression ratios.
"his relatively simple LSTM-based neural net compressor is 10x faster than one of the more complex state of the art algorithms while delivering comparable compression ratios."
No, it's 10x faster than an experimental, completely performance-unoptimized LSTM model that is far from state-of-the-art on its own. From the paper:
"Regarding the speed, our results show that our implementation is much faster than lstm-compress[8] with a similar model and gain. The speed improvement comes from the more optimized matrix multiplication implementation and the larger batch size (16 instead of 1)."
Of most interest here is the library used, LibNC, which might allow us to speed up cmix a bit (even making it 2x faster would already help a lot in testing).
Fair enough. cmix is awesome work, obviously, and I'm looking forward to the ideas becoming production-ready. Probably the most notable contribution by Fabrice Bellard is having something relatively simple.
He's like 1000000x programmer if you consider the kinds of projects he successfully completes. An "average" programmer never does anything anywhere near as impactful as ffmpeg throughout the entirety of their career. And he has multiple of those under his belt.
“We presented a practical implementation of an LSTM based lossless compressor. Although it is slow, its description is simple and the memory consumption is reasonable compared to compressors giving a similar compression ratio. It would greatly benefit from a dedicated hardware implementation.”
I only skimmed through the paper, but looks to me like this might be a 10X improvement over other LSTM compression methods, not state-of-the-art compression.
That said, I’m far from being an expert in compression and was hoping from someone in HN to explain how relevant this is in their discipline.
The comment you are replying to was trying to explain the "10X programmer" idea. Nowhere was it claimed that the algorithm would have a 10X performance improvement.
No, I meant this comment: "His relatively simple LSTM-based neural net compressor is 10x faster than one of the more complex state of the art algorithms"
I meant it as something of a joke (10x compression speed vs 10x the productivity of the average programmer), but also to summarize useful information from the paper.
I would add that lstm-compress from Byron Knoll, the author of CMIX, already achieved comparable compression ratios at 10x the speed, as is shown in the paper.
10x the speed of what? lstm-compress achieves a ratio of 1.64 bpb with a speed of 4.4 kB/s, Bellard's LSTM (small) achieves the same ratio with a speed of 41.7 kB/s.
Interesting to see another Fabrice project, and great to see compression performance on par with the best available, but unfortunately it still falls short of what's necessary to win this prize:
Arithmetic encoding https://en.wikipedia.org/wiki/Arithmetic_encoding . The way I would put it is: the statistical model does its best to predict the next bits, assigning short bit strings to what it thinks is likely to come next and long bit strings to less likely next bits, and if it is indeed right, only the short bitstrings need to be emitted; if it's wrong, more long bit strings must be spent to correct it and emit the exact right bitstring. (Like using shorthand with lots of convenient shortcuts, and then falling back to painfully writing everything out explicitly.) So, you can use imperfect lossy probabilistic models, while still getting lossless compression.
Arithmetic coding is a standard compression technique. It uses less bits for more frequent characters (think Huffman code, even though they are not the same).
This one makes arithmetic coding "adaptive". Consider this. The rough frequency of `e' in English is about 50%. But if you just seen this partial sentence "I am going to th", the probability/frequency of `e' skyrockets to, say, 98%. In standard arithmetic coding scheme, you would still parametrize you encoder with 50% to encode the next "e" despite it's very likely (~98%) that "e" is the next character (you are using more bits than you need in this case), while with the help of a neural network, the frequency becomes adaptive.
The NN is used for the probability model for the compression - it's not the compression itself. In simplest terms, think of it as being used to predict the next bit.
For every time you say "LSTM is industry grade" I would point you to large companies (Google-level) which train their acoustic models with TCN architectures.
It's rare that a model introduced in a paper has fair comparisons. You'll want to see it used by many different authors before one can speak with confidence on claimed improvements.
In this paper, Bellard makes a comparison to the Transformer, some variant of which holds most NLP records. Of note is that in this test, the Transformer did not outperform the LSTM. Is it that Transformers need to be larger to outperform or LSTMs are more suited to compression or that more epochs are required? This deserves further investigation, what are the required conditions for Transformers to underperform LSTMs?
The problem in the paper is to predict most probable "words". In NLP terms, model must be optimized for perplexity. Transformer holds records not for perplexity (it is too heavy for that) but for other tasks like machine translation or semantic similarity, I believe.
If this sounds far-fetched, consider that he has created FFMPEG, QEMU, LibBF, SoftFP, BPG, TinyEMU, a software implementation of 4G/LTE, a PC emulator in Javascript, the TCC compiler, TinyGL, LZEXE, and a tiny program for computing the biggest known prime number.
And that's just a partial list of his successful projects, which now of course also include software for more efficient lossless compression with deep LSTM and Transformer neural networks.
Any of these projects, on its own, would be considered a notable achievement for an ordinary human being.
Fabrice Bellard deserves some kind of superhuman lifetime achievement award.
Source: https://bellard.org