As others commented on it is actually a compression algorithm in disguise, but it does have some tricks.
- js13kGames measures the file size by its ZIP archive size, where the ZIP file itself is provided by the participant. In the past people used ZIP recompressors like ADVZIP or ECT to minimize the ZIP file. Roadroller needs to store a large amount of binary data, but JS string literals are not efficient in terms of information entropy (~6.96 bits per byte [1]). Therefore Roadroller actually stores 6-bit code which can be encoded in a very efficient Huffman tree. The 6-bit code is ASCII only, so it is much easier to handle.
- Roadroller tunes a lot of compression parameters during its optimization phase. It is normally a curse that you have to bring your own decoder, but Roadroller generates a decoder optimized specifically for given compressed data.
- Roadroller is allowed to transform the JS code input, so it tries to strategically insert a whitespace around tokens for better compression (see Packer.prepareJs comments for more information). So it doesn't necessarily produce the most minified code---Roadroller currently doesn't care about JS syntax and only does tokenization---but it will produce the most compressible code.
[1] Assuming UTF-8. If the file is encoded in ISO-8859-1 or similar this can be greatly improved: there are 4 or 5 forbidden bytes [2] out of 256 possible octets so ~7.977 bits per byte is possible.
[2] $, `, \, \r. When included in the <script> element \0 is also included.
That's wild. Would there ever be a usecase for this in a normal web app? I.e. would it be possible that decoding could take less time than downloading the extra data?
Absolutely not. Roadroller currently targets the maximum latency of 1 second for "typical" inputs (~50 KB), so it would translate to ~50 KB/s decompression rate. If we were able to get any gain out of it the communication channel should have been much slower (~5 KB/s). At that point we would just avoid JS anyway.
It is possible to make it an order of magnitude faster (the decoder is size-optimized, not speed-optimized), but its usefulness is still limited. The algorithm (context mixing) runs a large number of context models both for the compression and decompression, so its run time is proportional to the number of bytes being compressed. In the other words, if the input size is 1 MB the run time is proportional to that 1 MB, no matter it compressed into 200 KB or 10 KB. The only way to make the decompression faster is to reduce the input size via preprocessing; that preprocessor would have to run the context models to be efficient however.
I did hear of a semi-serious use case where an embedded device has to ship a JS front end to its API and its ROM is limited and can't be expanded. In that case it would make more sense to put a custom compression algorithm. However a more reasonable answer would be to force `Content-Encoding: br` to the client and store a Brotli stream (which would be at most 10% larger than Roadroller anyway).
- js13kGames measures the file size by its ZIP archive size, where the ZIP file itself is provided by the participant. In the past people used ZIP recompressors like ADVZIP or ECT to minimize the ZIP file. Roadroller needs to store a large amount of binary data, but JS string literals are not efficient in terms of information entropy (~6.96 bits per byte [1]). Therefore Roadroller actually stores 6-bit code which can be encoded in a very efficient Huffman tree. The 6-bit code is ASCII only, so it is much easier to handle.
- Roadroller tunes a lot of compression parameters during its optimization phase. It is normally a curse that you have to bring your own decoder, but Roadroller generates a decoder optimized specifically for given compressed data.
- Roadroller is allowed to transform the JS code input, so it tries to strategically insert a whitespace around tokens for better compression (see Packer.prepareJs comments for more information). So it doesn't necessarily produce the most minified code---Roadroller currently doesn't care about JS syntax and only does tokenization---but it will produce the most compressible code.
[1] Assuming UTF-8. If the file is encoded in ISO-8859-1 or similar this can be greatly improved: there are 4 or 5 forbidden bytes [2] out of 256 possible octets so ~7.977 bits per byte is possible.
[2] $, `, \, \r. When included in the <script> element \0 is also included.