Unfortunately both the article and the code suffer from a crucial misunderstanding of JPEG -- the 8x8 blocks do actually undergo a 2-dimensional DCT. Zig-zag scanning down to 1D comes later, after quantization, and is applied to the quantized DCT coefficients in order to maximize runs of zero for the benefit of RLE.
The code and the article assume that the scan is applied earlier, to the spatial data, in preparation for a 1D DCT and as a consequence mistakenly performs all of its operations in scan order. In reality, a 1D DCT of zigzag-scanned samples differs dramatically from a zigzag scan of 2D frequency coefficients.
The technique still works to an extent as it's still processing edges (only diagonally), creating a similar effect in some cases to the desired one, but likely less effective than a row-column or other 2D approach which models the actual transform and the intended behavior outlined in the blog post.
Surprisingly this code was accepted into mozjpeg nearly a year ago and remains as in the blog post.
Clipping is required by JFIF, so you can't implement spilling in the decoder by default, even if you assume the source was 0-255 RGB. So implementing this in the encoder like the article does is best. but I have mixed feelings about it since it's most helpful for images that shouldn't be saved as JPEG at all.
There are a lot of photographic jpegs that have black and white text too. Perhaps a caption. Perhaps a road sign in the scene. Reducing block noise seems better than saving such images as bajillion byte pngs.
The fundamental problem here is that the Nyquist frequency of displays is well below the human visual limit. Furthermore, the display doesn't even attempt to reconstruct a band-limited signal like an audio DAC would: it simply maps samples to pixels. This direct mapping lets you render images with much higher frequency components than you'd otherwise be able to, but as soon as you do limit the bandwidth of the signal (like the DCT is doing here) you get ringing.
Of course, this ringing is totally natural, and if the display operated "properly" from a signal processing perspective (like an audio DAC), you'd see it everywhere: a single-pixel line would be below the band limit!
To really solve the problem, you have to stop treating pixels as samples, since they're not displayed that way. Instead, you should upsample the image to a higher resolution using a kernel which matches how a display renders pixels, perform your signal processing on this higher-resolution image, then downsample using the same kernel.
The code and the article assume that the scan is applied earlier, to the spatial data, in preparation for a 1D DCT and as a consequence mistakenly performs all of its operations in scan order. In reality, a 1D DCT of zigzag-scanned samples differs dramatically from a zigzag scan of 2D frequency coefficients.
The technique still works to an extent as it's still processing edges (only diagonally), creating a similar effect in some cases to the desired one, but likely less effective than a row-column or other 2D approach which models the actual transform and the intended behavior outlined in the blog post.
Surprisingly this code was accepted into mozjpeg nearly a year ago and remains as in the blog post.