It's worth remembering that if you're doing all these methods to optimize a page for fast mobile display, that most hardware is typically CPU limited more than network limited during a good part of displaying most webpages.
Displaying a JPEG in multiple passes uses significantly more CPU. Rather than decoding the JPEG once and putting it on the screen, you end up decoding it 5, 10, or even 20 times, and have to rescale it, render any overlapping text or effects, composite it and put it on the display every time.
Some hardware has accelerated JPEG decoding, but usually will still have to render overlapping text, borders, or clip masks and with the CPU. The frequent back and forth between GPU and CPU ends up being a big overhead too.
Your optimized jpeg file might theoretically render decently with fewer bytes downloaded, but your overall pageload might be delayed by a second or more due to the extra rendering required.
> Displaying a JPEG in multiple passes uses significantly more CPU
No, it's not. It has become a meme mainly because libjpeg-turbo v1 didn't have optimizations for progressive coding. It's been fixed in v2. Progressive rendering keeps state and incrementally updates it. Browsers throttle refresh speed to avoid expensive edge cases. There has been a lot of investment in browsers to make compositing dirt cheap.
There are ways to make incomplete JPEG decoding even faster, e.g. decode just DC passes (that's 1/64th of memory and no IDCT), but the feedback I've got from maintainer of libjpeg-turbo and from browser vendors is that it's at best "nice to have" territory and the progressive JPEG overhead is a non-issue in practice.
My experience tuning the Chromium rendering pipeline doesn't match this. The actual jpeg decode is fast, but all the rest of the drawing and compositing required is sloooow. For one thing, the skia graphics library has to reprocess the drawlist for all things in the layer, and the drawlist (including the elements outside the clipping rect of the image) needs to be serialised and sent to the GPU process each time for every update. Perhaps I'm missing something tho.
It sounds like OP (no offence to them) is missing the context? Decoding a JPEG itself can be super fast but if laying out every element that overlaps that JPEG is slow the a progressive JPEG is still going to cause speed issues.
Haven't seen images load like that since the dial-up modem days.
I cludged together a node.js application in 2013 to load JPEG SOS segments separately to the browser. The idea was to tie it to depth in a VR application, like level-of-detail maps in game engines, but 'online'. Turned out no browser like that much so I dropped the project.
Probably-stupid question from a non-specialist: if you're digitising an existing real-world picture and want really high fidelity, then why would you convert it to a lossy format like JPEG?
Fair question. The image was assembled way back in 2014 or earlier from a large number of tiled JPEG source images, so the quality had already been compromised. Also, JPEGs are essentially viewable on any device this side of 1990 and way smaller than lossless formats. These days I would consider webp in preference to JPEG.
Could progressive mode be used to serve thumbnails by just truncating the image at a suitable point, or does the spec (and so, decoders) expect the whole image to eventually arrive?
You don't need whole image. If you stop at DC passes (~12-15% of the file) you get exactly 1/8th resolution of the image, although scaled without gamma correction.
With HTTP/2 you can micromanage delivery of JPEG scans to deliver placeholders quickly, and delay delivery of unnecessary levels of detail:
Since progressive JPEGs are displayed while downloading and the connection could just be closed at any moment anyway ... I don't think that'd be a problem. Whether that's more efficient than an extra thumbnail is probably the more interesting question.
If you have time and space to pre-generate thumbnails, it's probably not a significant win, but I think it could work well for displaying local thumbnails of JPEGs, like from a camera.
If you're browsing a directory of hundreds of large (e.g. 10+ MB) JPEG photographs, generating the thumbnails by fully decompressing all of them would take while. "Progressive thumbnails" that only decompress the first ~100 KB would be much faster.
You can do that even with non-progressive JPEGs, as you can use just the low frequency terms from the discrete cosine transform (the same data that comes first with progressive ordering).
You would still have to read the entire JPEG in though, wouldn't you?
I'm not an expert on JPEG, but I think that if you want the macro blocks at the bottom of the image, you still need to un-Huffman the all the blocks before it to find where the macro blocks start (since AFAIK there isn't a table indicating where each block starts). That means you have to read the entire JPEG from storage, only to through away the vast majority of it.
Even if there was a way to magically predict where the low frequency values of the image are stored, you'd still have to do tens of thousands of random reads to just get to them. Reading the whole file would be faster.
So if you have 500 photos and you want to go though them and need some thumbnails, for non-progressive image thumbnail generation, you have to read 10 MB x 500 images = 5 GB of data, but with a progressive thumbnail you only need the first 100 KB x 500 images = 50 MB of data.
As an aside, if you're just wanting thumbnails, most digital cameras encode small (120x160, ish) thumbnails in the EXIF header that can be quickly extracted by exiftool.
Sounds like it should be possible if you can terminate the transfer of the file early:
> libjpeg has some interesting features as well. Rather than decoding an entire full-resolution JPEG and then scaling it down, for instance (a common use case when generating thumbnails), you may set it up when decoding so that it will simply do the reduction for you while decoding. This takes less time and uses less memory compared with getting the full decompressed version and resampling afterward.
The paragraph you quoted simply means that the decompression library is able to decompress to a smaller size raw image, and says nothing about jpeg file format.
It all depends on what the client is doing when it’s decoding, so YMMV, but progressive JPEG is not a panacea. In addition to the CPU concerns mentioned above there can be memory implications as well. Many progressive JPEG decoders require a full-sized destination buffer to decode. Baseline decoders usually only need one row of blocks. If most of the time some downsampling is required anyways, it can be done inline with baseline encoded images. A progressive implementation may need order n^2 memory footprint. If you have many pages with many images all of which are going to require some client side scaling this could add up to a lot of unnecessary alloc/dealloc too.
But I encourage you to try both ways and pick a winner based on your own results.
I should think that if the quantization matrix (which is stored in the JPEG file) is known, then it should be possible to re-encode the JPEG file losslessly after it has been converted from JPEG into a lossless format so that the DCT is lost. However, I do not know if any program actually does this, or how to write such a program. Another thing I considered is if a decoder could be made to somehow try to improve the quality of the output by producing a picture which could have been the input to the encoder with the specified quantization matrix.
(Of course, all of this is not what the article is about (it is about progressive and multi-scan JPEG), but still it is my questions/comments anyways.)
> jpegtran can do some other things losslessly as well - flipping, cropping, rotating, transposing, converting to greyscale
Ok so those are only lossless in specific situations when the image dimensions are a multiple of the dct block size. And of course grayscale and cropping are lossy by definition. A better way to say it is that it can perform some transformations without recomputing and compressing the dct although if you pass the -optimize option you are still redoing the Huffman tables.
I understand your point, but it's not necessarily a better way to say it. The full term is "generation loss", and it applies regardless of destructive edit operations.
Also note that JPEG supported arithmetic coding for somewhat better compression, but most encoders and decoders didn't support it (do they currently?) because of patents that expired over the last few years.
The majority of software still doesn't support it. It's a chicken-and-egg problem --- almost no one uses arithmetic mode because no software supports it, and almost no software supports it because almost no one uses arithmetic mode.
JPEG and many other image formats support adding arbitrary metadata, you could easily sign the image data and add it as an Exif tag. I'm not sure if anybody is doing it, though. It seems like a good idea, though, maybe we should start.
Here [1] is a paper evaluating the concept. The link to the code examples seems dead, but I bet you could cobble something together in a couple lines of shell script. The only difficult part is signing just the image data as opposed to the whole file (which you're going to modify by appending the signature itself).
Obviously, signing the image like that only guarantees that the owner of the key also generated the image. You're going to have to trust them and their sources that they haven't modified it.
If the camera itself did the signing in a secure manner, that would be a much stronger guarantee. You could rely on a JPEG or RAW file being generated from a Canon camera by validating Canon's signature. I think some cameras can do the signing part, but not so much the secure manner part[2].
In either case it's obviously trivial to strip the signature. So it doesn't help photographers who want to prevent reproduction of their works without attribution.
Finally, here[3]'s a Stack Overflow post on the topic,
Wow, thanks for the detailed reply. My initial thought was you could make sure evidence submitted in court hasn't been edited (ie photoshop someone at a crime scene, or add someone to a pic for an alibi), but there could be a few uses. The problem with exif data is as you say it's trivial to strip, but it might be a stepping point towards another "secure" format (sjpeg?)
If you want truly secure format for these kinds of applications the best thing you can do is to sign the data externally (or wrap the whole file in some signed container, which seems to be the preferred solution for EU's EIdAS and related stuff).
On the other hand I have seen totally insecure, but effective hack for formats with embedded metadata: include some kind of value in there that is usually prominently displayed by OS and applications but store it in somewhat broken way such that applications trying to preserve the metadata will break it even more and it would become unreadable.
Displaying a JPEG in multiple passes uses significantly more CPU. Rather than decoding the JPEG once and putting it on the screen, you end up decoding it 5, 10, or even 20 times, and have to rescale it, render any overlapping text or effects, composite it and put it on the display every time.
Some hardware has accelerated JPEG decoding, but usually will still have to render overlapping text, borders, or clip masks and with the CPU. The frequent back and forth between GPU and CPU ends up being a big overhead too.
Your optimized jpeg file might theoretically render decently with fewer bytes downloaded, but your overall pageload might be delayed by a second or more due to the extra rendering required.