Deringing in DCT via overshoot and clipping

0x09 · on Aug 29, 2015

Unfortunately both the article and the code suffer from a crucial misunderstanding of JPEG -- the 8x8 blocks do actually undergo a 2-dimensional DCT. Zig-zag scanning down to 1D comes later, after quantization, and is applied to the quantized DCT coefficients in order to maximize runs of zero for the benefit of RLE.

The code and the article assume that the scan is applied earlier, to the spatial data, in preparation for a 1D DCT and as a consequence mistakenly performs all of its operations in scan order. In reality, a 1D DCT of zigzag-scanned samples differs dramatically from a zigzag scan of 2D frequency coefficients.

The technique still works to an extent as it's still processing edges (only diagonally), creating a similar effect in some cases to the desired one, but likely less effective than a row-column or other 2D approach which models the actual transform and the intended behavior outlined in the blog post.

Surprisingly this code was accepted into mozjpeg nearly a year ago and remains as in the blog post.

joosters · on Aug 29, 2015

Surprisingly this code was accepted into mozjpeg nearly a year ago and remains as in the blog post.

Why not? The examples seem to work and show improvements. Anyone is welcome to submit an improved version if they like.

pornel · on Aug 29, 2015

> creating a similar effect in some cases to the desired one, but likely less effective than a row-column or other 2D approach

I have actually implemented and tested a 2D overshoot, and it gave worse quality/filesize ratio.

Perhaps by accident (or less obvious correlation), but this order seems to give most compressible pattern.

varjag · on Aug 29, 2015

Thanks for your comment; I wrote a JPEG codec years ago and was now beginning to doubt my sanity.

It would be interesting to see how far this technique would take if implemented with correct assumptions.

rer0tsaz · on Aug 29, 2015

Another thing that can produce out-of-gamut colors in JPEG is chroma subsampling. http://www.glennchan.info/articles/technical/chroma/chroma1.... has a good overview.

Clipping is required by JFIF, so you can't implement spilling in the decoder by default, even if you assume the source was 0-255 RGB. So implementing this in the encoder like the article does is best. but I have mixed feelings about it since it's most helpful for images that shouldn't be saved as JPEG at all.

tedunangst · on Aug 29, 2015

There are a lot of photographic jpegs that have black and white text too. Perhaps a caption. Perhaps a road sign in the scene. Reducing block noise seems better than saving such images as bajillion byte pngs.

panic · on Aug 29, 2015

The fundamental problem here is that the Nyquist frequency of displays is well below the human visual limit. Furthermore, the display doesn't even attempt to reconstruct a band-limited signal like an audio DAC would: it simply maps samples to pixels. This direct mapping lets you render images with much higher frequency components than you'd otherwise be able to, but as soon as you do limit the bandwidth of the signal (like the DCT is doing here) you get ringing.

Of course, this ringing is totally natural, and if the display operated "properly" from a signal processing perspective (like an audio DAC), you'd see it everywhere: a single-pixel line would be below the band limit!

To really solve the problem, you have to stop treating pixels as samples, since they're not displayed that way. Instead, you should upsample the image to a higher resolution using a kernel which matches how a display renders pixels, perform your signal processing on this higher-resolution image, then downsample using the same kernel.