Pay attention to just how good WebP is at _lossless_ comparison though!
I've always thought that one as flying under the radar. Most get stuck on WebP not offering tangible enough benefits (or even worse) over MozJPEG encoding, but WebP _lossless_ is absolutely fantastic for performance/speed! PNG or even OptiPNG is far worse. And very well supported online now, and leaving the horrible lossless AVIF in the dust too of course.
Lossless WebP is very good indeed. The main problem is that it is not very future-proof since it only supports 8-bit. For SDR images that's fine, but for HDR this is a fundamental limitation that is about as bad as GIF's limitation to 256 colors.
This limit didn't exist in my first version as well as the 4 GB limit. These were artificially introduced to "match" the properties of lossy WebP. We could have done better there.
Were you involved in creating WebP? If so that's super cool! Why would they want to match webp's lossy compression though? To make it more uniform? And do you know why lossy WebP had such a limitation in the first place? Thank you!
I designed the WebP lossless format, wrote the spec, and implemented the first encoder.
The constraint was in WebP lossy to facilitate exact compatibility with VP8 specification and hoping that it would allow hardware decoding and encoding of WebP images using VP8 hardware.
Hardware encoding and decoding were never used, but the limitation stuck.
There was no serious plan to do hardware lossless, but the constraint was copied for "reducing confusion".
I didn't and don't like it that much as more PNG images couldn't be represented as WebP lossless as a result of that.
Wow, that really sucks. I appreciate the explanation as well as your frustration with it.
My desktop had a mixture of PNG and WebP files solely because of this limitation. I use the past tense because they've now all been converted to JPEG XL lossless.
Another group of incompatibility with PNG was 16 bit coding. I had a plan to add it as simply sending two 8 bit images where the second image containing the 8 least significant bits would be predicted to be the same as the first. That way it would not be perfect, but it would be 100x better than how PNG deals with it. WebP had another plan for layers and tiles that never realized, and as a consequence WebP is stuck at 8 bits.
Ah, I didn't know this and I agree this is a fairly big issue and increasingly so over time. I think smartphones in particular hastened the demand for HDR quite a bit, what was once a premium/enthusiast feature you only had to explicitly buy into.
I haven't ran across websites that serves up HDR images, I am not sure I would notice the difference. WebP seems appropriately named and optimized for image delivery on the web.
Maybe you are thinking of high bit depth for archival use? I can see some use cases there where 8-bit is not sufficient, though personally I store high bit depth images in whatever raw format was produced by my camera (which is usually some variant of TIFF).
8-bit can have banding even without "HDR". Definitely not enough. 10 bit HDR video is becoming more common, and popularity for images will follow. Adoption is hampered by the fact that Windows has bad HDR support, but it all works plenty well on macOS and mobile platforms.
It would be nice if everything was non tone-mapped HDR all the time for software, and the windowing system or the monitor would do the (local) tone mapping.
No, you don't want that. You want your windowing system & UI/graphics layers working in display native range (see eg, Apple's EDR). The consistency of appearance of the SDR range is too important to leave it up to unknown tone mapping.
Also it's too power expensive to have your display in HDR if you're only showing SDR UIs, which not only is the common reality but will continue to be so for the foreseeable future.
Requiring every game, photo viewing app, drawing program, ... every application to decide how to do it's HDR->SDR seems unnecessary complication due to poor abstractions.
The locality of local tone mapping (the ideal approach to HDR->SDR mapping) would expose the window boundaries. Two photos or two halves of the same photo in different windows (as opposed to being in the same window) would create an artificial discontinuity for the correction fields being artificially contained within each window instead of spanning the users visual field as best as possible.
Every local tone mapping needs to make an assumption of the surrounding colors: is the window surrounded by black, gray, colored or bright light should influence how the tone mapping is done at borders. This information is not available for an app: it can only be done at the windowing system level or in the monitor.
The higher the quality of the HDR->SDR mapping in a system, the more opportunity there is to limit the maximum brightness, and thus also the opportunity for energy savings.
> Requiring every game [..] to decide how to do it's HDR->SDR seems unnecessary complication due to poor abstractions.
Games already do this regardless and it's part of their post processing pipelines that also govern aspects of their environmental look.
What's missing is HDR->Display which is why HDR games have you go through a clunky calibration process to attempt to reverse out what the display is going to the PQ signal, when what the game actually wants is what MacOS/iOS and now Android just give them - the exact amount of HDR headroom which the display doesn't manipulate.
As for your other examples, being able to target the display native range doesn't mean you have to. Every decent os has a compositor API that lets you offload this to the system.
Yes, if local tone mapping is done by the operating system (windowing system or the monitor itself), then there is a chance that borders of windows are done appropriately within their spatial context.
For things where you actually care about lossless you probably also don't care about HDR.
HDR is (or can be) good for video & photography, but it's absolutely ass for UI.
Besides, you can just throw a gainmap approach at it if you really care. Works great with jpeg, and gainmaps are being added to heif & avif as well, no reason jpegxl couldn't get the same treatment. The lack of "true 10-bit" is significantly less impactful at that point
Gainmaps don't solve 8-bit mach banding. If anything you get more banding: two bandings, one banding from each of the two 8-bit fields multiplied together.
Gainmaps "solve" the problem of computing a local tone mapping by declaring that it needs to be done at server side or at image creation time rather than at viewing time.
My prediction: Gainmaps are going to be too complex of a solution for us as a community and we are going to find something else that is easier. Perhaps we could end up standardizing a small set of local tone mapping algorithms applied at viewing time.
> Gainmaps "solve" the problem of computing a local tone mapping by declaring that it needs to be done at server side or at image creation time rather than at viewing time.
Which was already the case. A huge amount of phone camera quality is from advanced processing, not sensor improveds. Trying to get that same level of investment in all downstream clients is both unrealistic and significantly harder. A big aspect of why DolbyVision looks better is just Dolby forces everyone to use their tone mapping, and client consistency is critical.
Gainmaps also avoid the proprietary metadata disaster that plaugues HLG/PQ video content.
> If anything you get more banding: two bandings, one banding from each of the two 8-bit fields multiplied together.
The math works out such that you get the equivalent of something like 9 bits of depth but you're also not wasting bits on colors and luminance ranges you aren't using like you are with bt2020 hlg or PQ
I didn't try it out, but I don't see the 9 bit coming. I feel it gives about 7.5 bits.
Mixing two independent quantization sources will lead to more error.
Some decoding systems such as traditional jpeg does not specify results exactly so bit-perfect quantization-aware compensation is not going to be possible.
The context of this thread is lossless webp, there aren't any compression artifacts to deal with.
Go try the math out, the theoretical is higher than 9 bits but 8.5-9 bit equivalent is very achievable in practice. With two lossless 8 bit sources this is absolutely adequate for current displays, especially since again you're not wasting bits on things like PQ's absurd range.
Will this be added to webp? Probably not, it seems like a dead format regardless. But 8 bit isn't the death knell for HDR support as is otherwise believed.
You just cannot reach the best quality with 8 bits, not in SDR, not in HDR, not with gainmap HDR. Sometimes you don't care for a particular use case, and then 8 bits becomes acceptable. Many use cases remain where degradation by a compression system is unacceptable or creates too many complications.
Find me a single example of a UI in HDR for the UI components, not for photos/videos.
> Even SDR needs 10-bit in a lot of situations to not have banding.
You've probably been looking at 10-bit HDR content on an 8-bit display panel anyway (true 10-bit displays don't exist in mobile yet, for example). 8-bits works fine with a bit of dithering
Yes, but the place where you want dithering is in the display, not in the image. Dithering doesn't work with compression because it's exactly the type of high frequency detail that compression removes. It's much better to have a 10 bit image which makes the information low frequency (and lets the compression do a better job since a 10 bit image will naturally be more continuous than an 8 bit image since there is less rounding error in the pixels), and let the display do the dithering at the end.
Gainmaps only take a lot of space if implemented poorly in the software creating the gainmap. On Pixel phones they take up 2% of the image size, by being stored in quarter resolution and scaling back up during decoding, which works fine on all gainmap-supported devices.
They're more prone to banding, but it's definitely not a problem with all images.
I guess I should say, take up lots of space unless you're OK with lower quality. In any case, gainmaps are entirely unrelated to the 8 bit vs 10 bit question. The more range you have (gamut or brightness) the worse 8 bit is, regardless of whether you're using a gainmap or not. And you can use gainmaps with 10 bit images.
I would, at a first blush, disagree with that characterization? Dialogue equals more fine-grained strokes and more individual, independent “zones” to encode.
IIRC the way you encode grayscale in WebP is a SUBTRACT_GREEN transform that makes the red and blue channel 0 everywhere, and then use a 1-element prefix code for R and B, so the R and B for each pixel take zero bits. Same idea with A for opaque images. Do you know why that's not good enough?
If I had just added 128 to the residuals, all remaining prediction arithmetic would have worked better and it would have given 1 % more density.
This is because most related arithmetic for predicting pixels is done in unsigned 8 bit arithmetic. Subtract green moves such predictions to often cross the 0 -> 255 boundary, and then averaging, deltas etc make little sense and add to the entropy.
WebP also has a near-lossless encoding mode based on lossless WebP specification that is mostly unadvertised, but should be preferred over real lossless in almost every use case. Often you can half the size without additional visible loss.
Is this mode picked automatically in "mixed" mode?
Unfortunately, that option doesn't seem to be available in gif2webp (I mostly use WebP for GIF images - as animated AVIF support is poor on browsers and that has an impact in interoperability)
Not in gif2webp, no. It is available in img2webp as a global (not per-frame) option. It looks like when you pass it, it will be used for all lossless encoding, including when "-mixed" tries the lossless mode.
WebP near-lossless is very far in compression density against that kind of visually lossless. Still 2-3x more bits I think. No reason to compare. The near-lossless (at settings 60 and 80) is closer to pixel perfect no matter how much you zoom, whereas Jon's "visually lossless" is what I'd rather call usual very high quality lossy without pixel precision guarantees.
If we replace those heuristics with a search that tries out which of the values is closest to the original, we should get better quality, especially at the lowest bitrates where smoothing is important.
It is unlikely that there will be any bitstream changes in JPEG XL. There is still a lot of potential for encoder improvements within the current bitstream, both for lossy and for lossless.
Last time I tested cwebp it did not handle (PNG) color spaces correctly so the result of a supposedly lossless conversion actually looked different from the original. What good is better lossless compression if it is not actually lossless visually?
It is ironic you said this because when I disabled webp in my browser because it had a huge security vulnerability, Discord was the only site which broke and didn't immediately just serving me more reasonable image formats.
At the very low quality settings, it's kinda remarkable how jpeg manages to to keep a sharper approximation of detail that preserves the holistic quality of the image better in spite of the obvious artifacts making it look like a mess of cubism when examined close. It's basically converting the image into some kind of abstract art style.
It is because JPEG is given 0.5 bits per pixel, where JPEG XL and AVIF are given around 0.22 and 0.2.
These images attempt to be at equal level of distortion, not at equal compression.
Bpps are reported beside the images.
In practice, use of quality 65 is rare in the internet and only used at the lowest quality tier sites. Quality 75 seems to be usual poor quality and quality 85 the average. I use quality 94 yuv444 or better when I need to compress.
Bitrates are in the left column, jpg low quality is the same size as jxl/avif med-low quality (0.4bpp), so you should compare the bottom left picture to the top mid and right pictures.
JPEG bitrates are higher, so all it means is that SSIMULACRA2 is the wrong metric for this test. It seems that SSIMULACRA2 heavily penalizes blocking artifacts but doesn't much care about blur. I agree that the JPEG versions look better at the same SSIMULACRA2 score.
Humans generally tend to prefer smoothing over visible blocking artifacts. This is especially true when a direct comparison to the original image is not possible. Of course different humans have different tastes, and some do prefer blocking over blur. SSIMULACRA2 is based on the aggregated opinions of many thousands of people. It does care more about blur than metrics like PSNR, but maybe not as much as you do.
Human ratings are expensive and clumsy so people often use computed aka objective metrics, too.
The best OSS metrics today are butteraugli, dssim and simulacra. The author is using one of them. None of the codecs was optimized for that metrics except jpegli partially.
Yes, that was my takeaway from this that JPEG keeps edge sharpness really well (e.g. the eyelashes) while the jxl and avif smooth all detail out of the image.
No, JXL and AVIF keep keep the same level of edge sharpness but without all the blocking artifacts when given the same amount of bits per pixel as the lowest-quality jpeg.
I do not understand why this article focuses so much on encode speed, but for decode, which I believe represents 99% of usage in this web-connected world, give a cursory...
> Decode speed is not really a significant problem on modern computers, but it is interesting to take a quick look at the numbers.
Anything more than 100 MB/s is considered "enough" for the internet because at that point your bottleneck is no longer decoding. Most modern compression algorithms are asymmetric, that is, you can spend much more time on compression without significantly affecting the decompression performance, so it is indeed less significant once the base performance is achieved.
During the design process of pik/jpeg xl I experimented on decode speed as a personal experience to have an opinion about this. I tried a special version of chrome that artificially throttled the image decoding. Once the decoding speed gets into the 20 megapixels per second the feeling coming from the additional speed was difficult to notice. I tried 2, 20 and 200 megapixels per second throttlings. This naturally depends on image sizes and uses too.
There was a much more easy to notice impact from progressive images and even sequential images displayed in a streaming manner during the download. As a rule of thumb, sequential top-to-bottom streaming feels 2x faster as a waiting rendering, and progressive feels 2x faster than sequential streaming.
This matters way more for video (where you are decoding 30 images per second continuously) than it does for still images. For still images, the main thing that drains your battery is the display, not the image decoding :)
But in any case, there are no _major_ differences in decoding speed between the various image formats. The difference caused by reducing the transfer size (network activity) and loading time (user looking at a blank screen while the image loads) is more important for battery life than the decoding speed itself. Also the difference between streaming/progressive decoding and non-streaming decoding probably has more impact than the decode speed itself, at least in the common scenario where the image is being loaded over a network.
Assuming the websites are using images of appropriate dimensions (that is, not using huge images and relying on browser downscaling, which is a bad practice in any case), you can quite easily do the math. A 1080p screen is about 2 megapixels, a 4K screen is about 8 megapixels. If your images decode at 50 Mpx/s, that's 25 full screens (or 6 full screens at 4K) per second. You need to scroll quite quickly and have a quite good internet connection before decode speed will become a major issue, whether for UX or for battery life. Much more likely, the main issue will be the transfer time of the images.
Aside from all points you raise I find the discussion about battery life a little absurd in light of how negligible it is compared to the impact poorly written JavaScript in the context of web apps. For example, I noticed this morning that my bank somehow pins one CPU thread to 100% usage whenever I have their internet banking site open, even when nothing is being done. AFAIK there is no cryptocurrency nonsense going on, and the UI latency is pretty good too, so my best guess is that their "log out automatically after ten minutes of inactivity" security feature is implemented through constant polling.
And the one you're replying to is also talking about battery life. The energy needed to display an image for a few seconds is probably higher than the energy needed to decode it.
Agreed. For web use they all decode fast enough. Any time difference might be in progression or streaming decoding, vs. waiting for all the data to arrive before starting to decode.
For image gallery use of camera resolution photographs (12-50 Mpixels) it can be more fun to have 100+ Mpixels/s, even 300 Pixels/s.
I wasn't able to convince myself about that when approaching that question with with back-off-the-envelope calculation, published research and prototypes.
Very few applications are constantly decoding images. Today a single image is often decided in a few milliseconds, but watched 1000x longer. If you 10x or even 100x energy consumption of image decoding, it is still not going to compete with display, radio and video decoding as a battery drain.
As others pointed out, that's why JPEG XL's excellent support for progressive decoding is important. Other formats do not support progressive decoding at all or made it optional, so it cannot be even compared at this point. In the other words, the table can be regarded as an evidence that you can have both progressive decoding and performance at once.
There is no "throughput vs latency" here, there is no "start-up time" for decoding an image that's already in RAM. If a decoder decodes at 100MiB/s, and a picture is 10MiB, it's decoded in 0.1 seconds. If the decoder decodes at 1 MiB/s, the same picture is decoded in 10 seconds.
This isn't true for web use cases. There being able to start decoding from when the first bits arrive vs the last bits can make a difference (which JPEG-XL does a lot better than other image formats because it first sends all the DC coefficients which lets the website display a low resolution version of the image and then fill in the detail as the rest of the image comes through.
That's about progressive decoding, which isn't what this benchmark is focusing on. When benchmarking decode speed, throughput is a perfectly good metric. If the article wanted to talk about progressive decode performance, it would require a complete redesign, not just a change in metric from "throughput" to "latency".
Is it practical to use hardware video decoders to decode the image formats derived from video formats, like AVIF/AV1 and HEIC/H264? If so that could be a compelling reason to prefer them over a format like JPEG XL which has to be decoded in software on all of today's hardware. Everything has H264 decode and AV1 decode is steadily becoming a standard feature as well.
No browser bothers with hardware decode of WebP or AVIF even if it is available. It is not worth the trouble for still images. Software decode is fast enough, and can have advantages over hw decode, such as streaming/progressive decoding. So this is not really a big issue.
No, not really - mostly because setup time and concurrent decode limitations of HW decoders across platforms tend so undermine any performance or battery gains from that approach. As far as I know, not even mobile platforms bother with it with native decoders for any format.
In some use cases the company is paying for the encoding, but the client is doing the decoding. As long as the client can decode the handful of images on the page fast enough for the human to not notice, its fine. Meanwhile any percentage improvement for encoding can save real money.
Companies and economies who optimize the cost of clients as if their own can see diffuse benefits from it. Even if they consider 0.01 % of that cost it can lead to better decisions. If that will make the website load faster or allow for more realistic product pictures, it can lead to faster growth or less returns etc.
But the server doesn't necessarily have unlimited time to encode those images. Each of those 1 million images needs to be encoded before it can be sent to a client.
The inclusion of QOI in the lossless benchmarks made me smile. It's a basically irrelevant format, that isn't supported by default by any general-public software, that aims to be just OK, not even good, yet it has a spot on one of these charts (non-photographic encoding). Neat.
GameMaker Studio has actually rather quickly jumped onto the QOI bandwagon, having 2 years ago replaced PNG textures with QOI (and added BZ2 compression on top) and found a 20% average reduction in size. So GameMaker Studio and all the games produced with it in the past 2 years or so do actually use QOI internally.
Not something a consumer knowingly uses, but also not quite irrelevant either.
That feels a little sad. If qoi had anything good (fast single-threaded decoding speed for photographic content) adding bz2 most certainly removed it. They could have just used WebP lossless and it would have been faster and smaller.
bz2 is obsolete. It’s very slow, and not that good at compressing. zstd and lzma beat it on both compression and speed at the same time.
QOI’s only selling point is simplicity of implementation that doesn’t require a complex decompressor. Addition of bz2 completely defeats that. QOI’s poorly compressed data inside another compressor may even make overall compression worse. It could heve been a raw bitmap or a PNG with gzip replaced with zstd.
And yet didn't reach the Pareto frontier! It's quite obvious in hindsight though---QOI decoding is inherently sequential and can't be easily parallelized.
Of course it didn't, it wasn't designed to be either the fastest nor the best. Just OK and simple. Yet in some cases it's not completely overtaken by competition, and I think that's cool.
I don't believe QOI will ever have any sort of real-world practical use, but that's quite OK and I love it for it has made me and plenty of others look into binary file formats and compression and demystify it, and look further into it. I wrote a fully functional streaming codec for QOI, and it has taught me many things, and started me on other projects, either working with more complex file formats or thinking about how to improve upon QOI. I would probably never have gotten to this point if I tried the same thing starting with any other format, as they are at least an order of magnitude more complex, even for the simple ones.
> Of course it didn't, it wasn't designed to be either the fastest nor the best. Just OK and simple. Yet in some cases it's not completely overtaken by competition, and I think that's cool.
Actually, there was a big push to add QOI to stuff a few years ago, specifically due to it being "fast". It was claimed that while it has worse compression, the speed can make it a worthy trade off.
As far a I understand this benchmark JXL was using 8 CPU cores, while QOI naturally only used one. If you were to plot the graph with compute used (watts?) instead of Mpx/s, QOI would compare much better.
Also, curious that they only benchmarked QOI for "non-photographic images (manga)", where QOI fares quite badly because it doesn't have palleted mode. QOI does much better with photos.
Actually, they did try QOI for the photographic images:
> Not shown on the chart is QOI, which clocked in at 154 Mpx/s to achieve 17 bpp, which may be “quite OK” but is quite far from Pareto-optimal, considering the lowest effort setting of libjxl compresses down to 11.5 bpp at 427 Mpx/s (so it is 2.7 times as fast and the result is 32.5% smaller).
17 bpp is way outside the area shown in the graph. All the other results would've gotten squished and been harder to read, had QOI been shown.
I just ran qoibench on the photos they used[1] and QOI does indeed fair pretty badly with a compression ratio of 71.1% vs. 49.3% for PNG.
The photos in the QOI benchmark suite[2] somehow compress a lot better (e.g. photo_kodak/, photo_tecnick/ and photo_wikipedia/). I guess it's the film grain with the high resolution photos used in [1].
One does wonder how much of JXL's awesomeness is the encoder vs. the format. Its ability to make high quality, compact images just with "-d 1.0" is uncanny. With other codecs, I had to pass different quality settings depending on the image type to get similar results.
That's a very good point. At this rate of development I wouldn't be surprised if libjxl becomes x264 of image encoders.
On the other hand, libvpx has always been a mediocre encoder which I think might be the reason for disappointing performance (I mean in general, not just speed) of vp8/vp9 formats, which inevitably also affected performance of lossy WebP. Dark Shikari even did a comparison of still image performances of x264 vs vp8 [0].
While WebP lossy still has image quality issues it has improved a lot over the years. One should not consider a comparison done with 2010-2015 implementations indicative of quality performance today.
I'm sure it's better now than 13 years ago, but the conclusion I got from looking at very recent published benchmark results is that lossy webp is still only slightly better than mozjpeg at low bitrates and still has worse max. PQ ceiling compared to JPEG, which in my opinion makes it not worth using over plain old JPEG even in web settings.
That matches my observations. I believe that WebP lossy does not add value when Jpegli is an option and is having hard time to compete even with MozJPEG.
Pik was designed initially without quality options only to do the best there is to achieve distance 1.0.
We kept a lot of focus on visually lossless and I didn't want to add format features which would add complexity but not help at high quality settings.
In addition to modeling features, the context modeling and efficiency of entropy coding is critical at high quality. I consider AVIFs entropy coding ill-suited for high quality or lossless photography.
Dear lord.. despite browsing and using github on a daily basis I still miss releases section sometimes! Before I saw your reply I checked the Scoop repos and sure enough, on Windows this will get you latest cjpegli version installed and added to path in one go:
scoop install main/libjxl
Note.. now that I tried it: that is really next level for an old format..!
Jpegli has the possibility of using XYB. By default as by being just a replacement of mozjpeg or libjpeg-turbo it doesn't.
I believe Jon has compared jpegli without XYB. If you turn XYB on, you get about 10 % more compression.
Jpegli is great even without XYB. It has many other methods for success (largely copied over from JPEG XL adaptive quantization heuristics, more precise intermediate calculations, as well as the guetzli method for variable dead-zone quantization).
disclaimer: I created the XYB colorspace, most of the JPEG XL VarDCT quality-affecting heuristics, and scoped jpegli. Zoltan (from WOFF2/Brotli fame!) did the actual implementation and made it work so well.
It is worth noting that the JPEG XL effort produced a nice new parallelism library called Highway. This library is powering not only JPEG XL but also Google's latest Gemma AI models.
> Today we're sharing open source code that can sort arrays of numbers about ten times as fast as the C++ std::sort, and outperforms state of the art architecture-specific algorithms, while being portable across all modern CPU architectures. Below we discuss how we achieved this.
Without taking into account whether JPEG XL shines on its own or not (which it may or may not), JPEG XL completely rocks for sure because it does this:
.. $ ls -l a.jpg && shasum a.jpg
... 615504 ... a.jpg
716744d950ecf9e5757c565041143775a810e10f a.jpg
.. $ cjxl a.jpg a.jxl
Read JPEG image with 615504 bytes.
Compressed to 537339 bytes including container
.. $ ls -l a.jxl
... 537339 ... a.jxl
Do you realize how many billions of JPEG files there are out there which people want to keep? If you recompress your old JPEG files using a lossy format, you lower its quality.
But with JPEG XL, you can save 15% to 30% and still, if you want, get your original JPG 100% identical, bit for bit.
That's wonderful.
P.S: I'm sadly on Debian stable (12 / Bookworm) which is on ImageMagick 6.9 and my Emacs uses (AFAIK) ImageMagick to display pictures. And JPEG XL support was only added in ImageMagick 7. I haven't looked more into that yet.
This particular feature might not, but if said screenshots are often compressed with JPEG XL, they will be spared the generation loss that becomes blatantly visible in some other formats: https://invidious.protokolla.fi/watch?v=w7UDJUCMTng
> The new version of libjxl brings a very substantial reduction in memory consumption, by an order of magnitude, for both lossy and lossless compression. Also the speed is improved, especially for multi-threaded lossless encoding where the default effort setting is now an order of magnitude faster.
Very impressive! The article too is well written. Great work all around.
Maybe someone here will know of a website that describes each step of the jpeg xl format in detail? Unlike for traditional jpeg, I have found it hard to find a document providing clear instructions on the relevant steps, which is a shame as there are clearly tons of interesting innovations that have been compiled together to make this happen, and I'm sure the individual components are useful in their own right!
Missing from the article is rav1e, which encodes AV1, and hence AVIF, a lot faster than the reference implementation aom. I've had cases where aom would not finish converting an image in a minute of waiting what rav1e would do in less than 10 seconds.
rav1e is generally head to head with libaom on static images, and which one wins on the speed/quality/size frontier depends a lot on the image and settings, as much as +/- 20%. I suspect rav1e has an inefficient block size selection algorithm, so the particular shape of blocks is a make or break for it.
I’ve only compared rav1e to mozjpeg and libwebp, and at fastest speeds it’s only barely ahead.
The article mentions encoding speed as something to consider, alongside compression ratio. I would argue that decoding speed is also important. A lot of the more modern formats (webp, avif etc) can take significantly more CPU cycles to decode than a plain old jpg. This can slow things down noticeably,especially on mobile.
Note that JPEG XL always supports progressive decoding, because the top-level format is structured in that way. The optional part is a finer-grained adjustment to make the output more suitable for specific cases.
Any computation-intensive media format on mobile is likely using a hardware decoder module anyway, and that most frequently includes JPEG. So that comparison is not adaquate.
Seriously, when is the last time mobile phones used hardware decoding for showing images? Flip phones in 2005?
I know camera apps use hardware encoding but I doubt gallery apps or browsers bother with going through the hardware decoding pipeline for hundreds of JPEG images you scroll through in seconds. And when it comes to showing a single image they'll still opt to software decoding because it's more flexible when it comes to integration, implementation, customization and format limits. So not surprisingly I'm not convinced when I repeatedly see this claim that mobile phones commonly use hardware decoding for image formats and software decoding speed doesn't matter.
I don't know the current status of web browsers, ut hardware encoding and decoding for image formats is alive and well. Not really relevant for showing a 32x32 GIF arrow like on HN, but very important when browsing high resolution images with any kind of smoothness.
If you don't really care about your users' battery life you can opt to disable hardware acceleration within your applications, but it's usually enabled by default, and for good reason.
> hardware encoding and decoding for image formats is alive and well
I keep hearing and hearing this but nobody has ever yet provided a concrete real world example of smart phones using hw decoding for displaying images.
That was quite contrary to my understanding, so I took some more time to verify both my and your claim. The reality turned out to be somewhere in the middle: modern mobile SoCs do ship with hardware JPEG decoding among others, but there is no direct API for that hardware decoding module in the mobile platform itself (Android 7 and onwards use libjpeg-turbo by default, for example). But mobile manufacturers can change the implementation details behind those APIs, so it is still true that some mobiles do use hardware JPEG decoding behind the scene. But it is hard to tell how common it is. So well, thank you for the counterpoint---that corrected my understanding.
They ship with hardware jpeg decoders because they ship with hardware jpeg encoders for camera capture latency reasons and it turns out you can basically just run that hardware in reverse.
The SoCs aren't investing more than a token amount of effort into those jpeg decoders, and from experience some of them claim to exist but produce the shittiest looking output imaginable and more slowly than jpeg-turbo at that.
Also you can trivially find out if your Android phone is doing this or not, just run some perf call sampling while decoding jpegs. If all you see is AOSP libraries & libjpeg-turbo, well then they aren't doing hardware decodes :)
Does JPEG XL have patent issues? I half remember something about that. Regular JPG seems fine to me. Better compression isn't going to help anyone since they will find other ways to waste any bandwidth available.
The main innovation claimed by Microsoft's rANS patent is about the adaptive probability distribution, that is, you should be able to efficiently correct the distribution so that you can use less bits. While that alone is an absurd claim (that's a benefit shared with arithmetic coding and its variants!) and there is a very clear prior art, JPEG XL doesn't dynamically vary the distribution so is thought to be not related to the patent anyway.
And yes, regular JPEG is still a fine format. That's part of the point of the article. But for many use cases, better compression is always welcome. Also having features like alpha transparency, lossless, HDR etc can be quite desirable, and those things are not really possible in JPEG.
I have an existing workflow where I take JPEGs (giant PNGs) from designers and reencode them using mozjpeg. However, I can't find a way to invoke jpegli tool in the same way, especially since it seems to just be part of the jpeg-xl tool? Is that right? Are there any sample invocations anywhere?
The problem with JPEG XL is that it is written in an unsafe language and has already had several memory safety vulnerabilities found in it.
Image codecs are used in a wide range of attacker-controlled scenarios and need to be completely safe.
I know Rust advocates sound like a broken record, but this is the poster child for a library that should never have been even started in C++ in the first place.
It’s absolute insanity that we write codecs — pure functions — in an unsafe language that has a compiler that defaults to “anything goes” as an optimisation technique.
Pretty much every codec in every browser is written in an unsafe language, unfortunately. I don't see why JXL should be singled out. On the other hand, there is a JXL decoder in Rust called jxl-oxide [1] which works quite well, and has been confirmed by JPEG as conformant. Hopefully it will be adopted for decode-only usecases.
> It’s absolute insanity that we write codecs — pure functions — in an unsafe language that has a compiler that defaults to “anything goes” as an optimisation technique.
Rust and C++ are exactly the same in how they optimize, compilers for both assume that your code has zero UB. The difference is that Rust makes it much harder to accidentally have UB.
"We've never had to wear helmets before, why start now?"
There are only a handful of image codecs that are widely accepted. Essentially just GIF, PNG, and JPG. There's a smattering of support for more modern formats, but those three dominate.
Adding a fourth image format is increasing this attack surface by a substantial margin across a huge range of software. Not just web browsers, but chat apps, server software (thumbnail generators), editors, etc...
This is the kind of thing that gets baked into standard libraries, operating systems, and frameworks. It's up there with JSON or XML.
You had better be damned sure what you're doing is not going to cause a long list of CVEs!
JPEG XL is a complex codec, with a lot of code. This increases the chance of bugs and increases the attack surface.
A (surprisingly!) good metric for complexity is the size of the zip file of the code. Libjpeg is something like 360 kB, libpng is 350 kB, and giflib is 90 kB.
The JXL source is 1.4 MB zipped, making more than twice the size of the above three combined!
The other libraries use C/C++ not because that's a better choice, but because it was the only choice back in the ... checks Wikipedia ... 1980s and 90s!
We live in the future. We have memory-safe languages now. We're allowed to use them. You won't get in trouble from anyone, I promise.
> "We've never had to wear helmets before, why start now?"
> We live in the future. We have memory-safe languages now. We're allowed to use them. You won't get in trouble from anyone, I promise.
That's why I specifically said that it's unfortunate that C++ is still wide spread, and pointed to a fully conformant JXL decoder written in Rust :p
> There are only a handful of image codecs that are widely accepted. Essentially just GIF, PNG, and JPG. There's a smattering of support for more modern formats, but those three dominate.
Every browser ships libwebp and an AVIF decoder. Every reasonably recent Android phone does as well. And every iPhone. Every (regular) install of Windows has libwebp. Every Mac has libwebp and dav1d. That's all C++. AVIF in particular is only a couple of years older than JXL, and yet I've never seen opposition to it on the grounds of memory safety. That is what I meant about JXL being singled out.
> JPEG XL is a complex codec, with a lot of code. This increases the chance of bugs and increases the attack surface.
> A (surprisingly!) good metric for complexity is the size of the zip file of the code. Libjpeg is something like 360 kB, libpng is 350 kB, and giflib is 90 kB.
> The JXL source is 1.4 MB zipped, making it nearly twice the size than all of the above combined.
Which code exactly are you including in that? The libjxl repo has a lot of stuff in it, including an entire brand new JPEG encoder! Though jxl certainly is more complex than those three combined, since JXL is essentially a superset of all their functionality, plus new stuff.
I revised my numbers a bit by filtering out the junk and focusing only on the code that most likely contributes to the runtime components (where the security risks lie). E.g.: Excluded the samples, test suites, doco, changelogs, etc... and kept mostly just the C/C++ and assembly code.
I also recompressed all of the libraries with identical settings to make the numbers more consistent.
I believe JPEG XL binary size is about one third of AVIF binary size. It is relatively compact. It is easy to write a small encoder: libjxl-tiny is just 7000 lines of code.
This is really impressive even compared to WebP. And unlike WebP, it's backwards compatible.
I have forever associated Webp with macroblocky, poor colors, and a general ungraceful degradation that doesn't really happen the same way even with old JPEG.
I am gonna go look at the complexity of the JXL decoder vs WebP. Curious if it's even practical to decode on embedded. JPEG is easily decodable, and you can do it in small pieces at a time to work within memory constraints.
Everyone hates WebP because when you save it, nothing can open it.
That's improved somewhat, but the formats that will have an easy time winning are the ones that people can use, even if that means a browser should "save JPGXL as JPEG" for awhile or something.
Everyone hates webp for a different reason. I hate it because it can only do 4:2:0 chroma, except in lossless mode. Lossless WebP is better than PNG, but I will take the peace of mind of knowing PNG is always lossless over having a WebP and not knowing what was done to it.
Neither of these are really what I'm referring to, as I view these as ~equivalent to converting a jpeg to png. What I mean is within a pipeline, once you have ingested a [png|webp|jpeg] and you need to now render it at various sizes or with various filters for $purposes. If you have a png, you know that you should always maintain losslessness. If you have a jpeg, you know you don't. You don't need to inspect the file or store additional metadata, the extension alone tells you what you need to know. But when you have a webp, the default assumption is that it's lossy but it can sometimes be otherwise.
I've noticed in chrome-based browsers, you can right click on a webp file and "edit image". When you save it, it defaults to png download, which makes a simple conversion.
Mobile browsers seem to default to downloading in png as well.
JPEG XL can be converted to/from JPEG without any loss of quality. See another commenter where shows a example where doing JPEG -> JPEG XL -> JPEG generates a binary exact copy of the original JPEG.
Yeah, this not means what usually we call backwards compatibility, but allows usage like storing the images as JPEG XL and, on the fly, send a JPEG to clients that can't use it, without any loss of information. WebP can't do that.
But that only works when the JXL has been losslessly converted from a JPEG in the first place, right? So this wouldn’t work for all JXL in practice. (Unless I’ve missed something and this is not the case.)
You could start with relatively good jpegli as a codec and then lossless recompress that with jpeg xl. Naturally some entity (server side, app, content encoding etc.) needs to unpack the jpeg xl jpeg into a usual jpeg before it can be consumed by a legacy system.
Often with this kind of pareto it can be argued that even when continuous decisions are not available, a compression system could keep choosing every second at effort 7 and every second at effort 6 (or any ratio), leading, on the average interpolated results. Naturally such interpolation does not produce straight lines in log space.
Yes, it should, but it looks like they just added a line to the jxl 0.10 series of data on whatever they used to make the graph, and labelled it the Pareto front. Looking closely at the graphs, they actually miss some points where version 0.9 should be included in the frontier.
I think it can be understood as an expected Pareto frontier if enough options are added to make it continuous, which is often implied in this kind of discussions.
I'm not sure that's reasonable - The effort parameters are integers between 1 and 10, with behavior described here: https://github.com/libjxl/libjxl/blob/main/doc/encode_effort..., the intermediate options don't exist as implemented programs. This is a comparison of concrete programs, not an attempt to analyze the best theoretically achievable.
Also, the frontier isn't convex, so it's unlikely that if intermediate options could be added then they would all be at least as good as the lines shown; and the use of log(speed) for the y-axis affects what a straight line on the graph means. It's fine for giving a good view of the dataset, but if you're going to make a guess about intermediate possibilities, 'speed' or 'time' should also be considered.
You are right, but that would make an uglier plot :)
Some of the intermediate options are available though, through various more fine-grained encoder settings than what is exposed via the overall effort setting. Of course they will not fall exactly on the line that was drawn, but as a first approximation, the line is probably closer to the truth than the staircase, which would be an underestimate of what can be done.
Whichever is more pessimistic. So for the axes in this article, the first one. If you have an option on the "bad" side of the Pareto curve, you can always find an option that is better in both axes. If a new option is discovered that falls on the good side of the curve, well, then the curve needs to be updated to pass thru that new option.
The choice to represent the speed based on multithreaded encoding strikes me as somewhat arbitrary. If your software has a critical path dependent on minimal latency of a single image, then it makes some sense, but you still may have more or fewer than 8 cores. On the other hand if you have another source of parallelism, for example you are encoding a library of images, then it is quite irrelevant. I think the fine data in the article would be even more useful if the single threaded speed and the scalability of the codec were treated separately.
Such a shame arithmetic coding (which is already in the standard) isn't widely supported in the real world. Because converting Huffman coded images losslessly to arithmetic coding provides an easy 5-10% size advantage in my tests.
The benefits of JPEG kind of go away if you start adopting more recent changes to the format, no? JPEG is nice because everything has supported it for 20+ years. JPEG-with-arithmetic-coding is essentially a new, incompatible format, why not use JXL or AVIF instead?
Yes, but this is really a pity in the specific case of Arithmetic Coding, because, unlike "more recent changes to the format", it's been in the standard since the very beginning - but is not supported by a lot of implementations due to software patents (which meanwhile expired, but their damage remains).
arithmetic encoding is old, but around 2010 (when all the patents expired), there was a ton of really good research on how to to table based ans and vectorized rans to make the performance good. Aside from the patent issues Arithmetic encoding wasn't pursued much because the CPU cost was too high. Now that multiply is cheap and the divisions can be avoided, ANS is a lot better than it used to be.
I'm surprised mozjpeg performed worse than libjpeg-turbo at high quality settings. I thought its aim was having better pq than libjpeg-turbo at the expense of speed.
It is nice 0.10 finally landed those memory and speed optimisation.
But the King remains HALIC. In terms of MT encoder it still uses 3.5x more memory than HALIC, and 6x encoding time compared to HALIC. While offering the same or smaller files size in Lossless. Hopefully JPEG XL could narrow those gaps some days.
WebP is awesome at lossless and way better than even PNG.
It's because WebP has a special encoding pipeline for lossless pictures (just like PNG) while AVIF is basically just asking a lossy encoder originally designed for video content to stop losing detail. Since it's not designed for that it's terrible for the job, taking lots of time and resources to produce a worse result.
Lossless webp is actually quite good, especially on text heavy images, e.g. screenshots of a terminal with `cwebp -z9` are usually smaller than `jxl -d 0 -e 9` in my experience.
Lossless AVIF is just really quite bad. Notice that how for photographic content, it is barely better than PNG, and for non-photographic content, it is far worse than PNG.
It has lossless just to check a box in terms of supported features. A bit like how JPEG XL supports animation just to have feature parity. But in most cases, you'll be better off using a video codec for animation, and an image format for images.
There are some user-level differences between an animated image and a video, which haven't really been satisfactorily resolved since the abandonment of GIF-the-format. An animated image should pause when clicked, and start again on another click, with setting separate from video autoplay to control the default. It should not have visible controls of any sort, that's the whole interface. It should save and display on the computer/filesystem as an image, and degrade to the display frame when sent along a channel which supports images but not animated ones. It doesn't need sound, or CC, or subtitles. I should be able to add it to the photo roll on my phone if I want.
There are a lot of little considerations like this, and it would be well if the industry consolidated around an animated-image standard, one which was an image, and not a video embedded in a way which looks like an image.
I believe it is more fundamental. I like to think that AV1 entropy coding just becomes ineffective for large values. Large values are dominantly present in high quality photography and in lossless coding. Large values are repeatedly prefix coded and this makes effective adaptation of the statistics difficult for large integers. This is a fundamental difference and not a minor difference in focus.
Usually the issue is not using the YCgCo-R colorspace. I do not see enough details in the article to know if that is the case here. There are politics around getting the codepoint included: https://github.com/AOMediaCodec/av1-avif/issues/129
"Pareto" being used outside the context of Brazil's best prank call ever (Telerj Prank) will always confuse me. I keep thinking, "what does the 'thin-voiced lawyer' have to do with statistics?"...
It's so frustrating how the chromium team is ending up as a gatekeeper of the Internet by pick and choosing what gets developed or not.
I recently come across another issue pertaining to the chromium team not budging on their decisions, despite pressure from the community and an RFC backing it up - in my case custom headers in WebSocket handshakes, that are supported by other Javascript runtimes like node and bun, but the chromium maintainer just disagrees with it - https://github.com/whatwg/websockets/issues/16#issuecomment-...
> It's so frustrating how the chromium team is ending up as a gatekeeper of the Internet by pick and choosing what gets developed or not.
https://github.com/niutech/jxl.js is based on Chromium tech (Squoosh from GoogleChromeLabs) and provides an opportunity to use JXL with no practical way for Chromium folks to intervene.
Even if that's a suboptimal solution, JXL's benefits supposedly should outweight the cost of integrating that, and yet I haven't seen actual JXL users running to that in droves.
So JXL might not be a good support for your theory: where people could do they still don't. Maybe the format isn't actually that important, it's just a popular meme to rehash.
Why do you assume that the benefits would outweigh said costs? That's a weird burden to set on the format. Using JavaScript on the browser to decode it is a huge hurdle, I don't know of any format that ever got popular or got its initial usage from a similar approach. Avif was just added too, even if no one was using a js library to decode it beforehand
Fwiw I agree that there's a weird narrative around jpegxl, at the end of the day it's just a format, and I think it's not very good for lower quality images as proven by the linked article in the OP. Avif looks better in that regard.
I think it would've made more sense than WebP though (which also doesn't look good at all when not lossless), but that was like a decade ago and that ship has sailed. So avif fills a niche that WebP sucks at, while jpegxl doesn't really do that. That alone is reason enough to not bother with including it.
People don't use blurry low quality images in the web. These low qualities don't matter outside of compression research.
Average/median quality of images is between 85 to 90 depending how you calculate it.
There, users' waiting time is worth during image formats life time for about 3 trillion USD. If we can reduce 20 % of it we create wealth of 600 billion USD distributed to the users. More savings come from data transfer costs.
> Why do you assume that the benefits would outweigh said costs? That's a weird burden to set on the format.
I'm not assuming that there are those benefits, but that there are people to see them. Those who _very_ vocal about browsers (and Chrome in particular) not supporting it seem to think so or they wouldn't bother.
If I propose integrating good old Targa file support into Chrome, I'd also be asked about a cost/benefit analysis. And by building and using a polyfill to add that support, I show that I'm serious about Targa files, which gives credence to my cost/benefit analysis and also lets people play around with the Targa format, hopefully making it self-evident that the format is good, and from there that these benefits based on native support would be even better.
For JXL I see people talking the talk but, by and large, not walking the walk.
I see what you mean. Yeah, I think jpegxl is the format that I've heard about the most but never really seen in the wild. It's a chicken and egg problem but still, it's basically not used at all compared to the Mindshare it seems to have in these discussions
What hammer? You want US president or supreme court to compel Chrome developers to implement every image format in existence and every JS API proposed by anyone anywhere?
Unless it is some kind of anti-competitive behavior like they intentionally stiffening adoption of standard competing with their proprietary patent-encumbered implementation that they expect to collect royalties for (doesn't seem to be the case), then I don't see the problem.
That's not how this works. Firefox is the closest we have, and realistically the closest we will get to a "better product" than Chromium for the foreseeable future, and it's clearly not enough.
The only hammer at all left is Safari, basically on iPhones only.
That hammer is very close to going away; if the EU does force Apple to really open the browsers on the iPhone, everything will be Chrome as far as the eye can see in short order. And then we fully enter the chromE6 phase.
Firefox is "neutral", which I understand as meaning they'll do whatever Chrome does.
All the code has been written, patches to add JPEG XL support to Firefox and Chromium are available and some of the forks (Waterfox, Pale Moon, Thorium, Cromite) do have JPEG XL support.
They didn't "lose interest", their lawyers pulled the emergency brakes. Blame patent holders, not Google. Like Microsoft: https://www.theregister.com/2022/02/17/microsoft_ans_patent/. Microsoft could probably be convinced to be reasonable. But there may be a few others. Google actually also holds some patents over this but they've done the right thing and license those patents along with their implementation.
To fix this, you'd need to convince Google, and other large companies that would be exposed to law suits related to these patents (Apple, Adobe, etc.), that these patent holders are not going to insist on being compensated.
Other formats are less risky; especially the older ones. Jpeg is fine because it's been out there for so long that any patents applicable to it have long expired. Same with GIF, which once was held up by patents. Png is at this point also fine. If any patents applied at all they will soon have expired as the PNG standard dates back to 1997 and work on it depended on research from the seventies and eighties.
There are no royalties to be paid on JPEG XL. Nobody but Cloudinary and Google is claiming to hold relevant patents, and Cloudinary and Google have provided a royalty free license. Of course the way the patent system works, anything less than 20 years old is theoretically risky. But so far, there is nobody claiming royalties need to be paid on JPEG XL, so it is similar to WebP in that regard.
"Patent issues" has become a (sometimes truthful) excuse for not doing something.
When the big boys want to do something, they find a way to get it done, patents or no, especially if there's only "fear of patents" - see Apple and the whole watch fiasco.
Adobe also has an order of magnitude lower number of installed software than Chrome or Firefox which makes patent fees much cheaper. And their software is actually paid for by users.
Not that simple. Maybe they struck a deal with a few of the companies or they made a different risk calculation. And of course they have a pretty fierce patent portfolio themselves so there's the notion of them being able to retaliate in kind to some of these companies.
I don't think that's true (see my other comment for what the patent is really about), but even when it is, Adobe's adoption means that JPEG XL is worth the supposed "risk". And Google does ship a lot of technologies that are clearly patent-encumbered. If the patent is the main concern, they could have answered so because there are enough people wondering about the patent status, but the Chrome team's main reason against JPEG XL was quite different.
Adobe sells paid products and can carve out a license fee for that, like they do with all the other codecs and libraries they bundle. That's part of the price you are paying.
The same thing can be said with many patent-encumbered video codecs which Chrome does support nevertheless. That alone can't be a major deciding factor, especially given that the rate of JPEG XL adoption has been remarkably faster than any recent media format.
Is this not simply a risk vs reward calculation? Newer video codecs present a very notable bandwidth saving over old ones. JPEG XL presents minor benefits over WebP, AVIF, etc. So while the dangers are the same for both the calculation is different.
The Microsoft patent doesn't apply to JXL, and in any case, Microsoft has literally already affirmed that they will not use it to go after any open codec.
How exactly is that done? I assume even an offhand comment by an official (like CEO, etc) that is not immediately walked back would at least protect people from damages associated with willful infringement.
I think it would be much better for everyone involved and humanity if Mr. Duda himself got the patent in the first place instead of praying no one else will.
And nothing advances your career quite like getting your employer into a multi-year legal battle and spending a few million on legal fees, to make some images 20% smaller and 100% less compatible.
But that doesn't matter. If a patent is granted, choosing to infringe on it is risky, even if you believe you could make a solid argument that it's invalid given enough lawyer hours.
The Microsoft patent is for an "improvement" that I don't believe anyone is using, but Internet commentators seem to think it applies to ANS in general for some reason.
A few years earlier, Google was granted a patent for ANS in general, which made people very angry. Fortunately they never did anything with it.
I believe that Google's patent application dealt with interleaving non-compressed and ANS data in a manner that made streaming coding easy and fast in software, not a general ANS patent. I didn't read it but discussed shortly about it with a capable engineer who had.
Not only you have no source backing your claim, but there is a glaring counterexample. Chromium's experimental JPEG XL support carried an expiry milestone, which was delayed multiple times and it was bumped last time on June 2022 [1] before the final removal on October, which was months later the patent was granted!
>To fix this, you'd need to convince Google, and other large companies that would be exposed to law suits related to these patents (Apple, Adobe, etc.), that these patent holders are not going to insist on being compensated.
Apple has implemented JPEG XL support in macOS and iOS. Adobe has also implemented support for JPEG XL in their products.
Also, if patents were the reason Google removed JXL from Chrome, why would they make up technical reasons for doing so?
Please don't present unsourced conspiracy theories as if they were confirmed facts.
Mate, you're literally pulling something from your ass. Chrome engineers claim that they don't want JXL because it isn't good enough. Literally no one involved has said that it has anything to do with patents.
>There must be a more rational reason than that. I've not heard anything better than legal reasons. But do correct me if I'm wrong. I've worked in big companies, and patents can be a show stopper. Seems like a plausible theory (i.e. not a conspiracy theory)
In your first comment, you stated as a fact that "lawyers pulled the emergency brakes". Despite literally no one from Google ever saying this, and Google giving very different reasons for the removal.
And now you act as if something you made up in your mind is the default theory and the burden of proof is on the people disagreeing with you.
Doesn't make sense when they support GIF or animated WebP as images. Animated WebP in particular is just a purposely gimped WebM that should not exist at all and would not need to exist if we could use video files directly.
If you want a simple conspiracy theory, how about this:
The person responsible for AVIF works on Chrome, and is responsible for choosing which codecs Chrome ships with. He obviously prefers his AVIF to a different team's JPEG-XL.
Helping the web to evolve is challenging, and it requires us to make difficult choices. We've also heard from our browser and device partners that every additional format adds costs (monetary or hardware), and we’re very much aware that these costs are borne by those outside of Google. When we evaluate new media formats, the first question we have to ask is whether the format works best for the web. With respect to new image formats such as JPEG XL, that means we have to look comprehensively at many factors: compression performance across a broad range of images; is the decoder fast, allowing for speedy rendering of smaller images; are there fast encoders, ideally with hardware support, that keep encoding costs reasonable for large users; can we optimize existing formats to meet any new use-cases, rather than adding support for an additional format; do other browsers and OSes support it?
After weighing the data, we’ve decided to stop Chrome’s JPEG XL experiment and remove the code associated with the experiment. [...]
I try to make a bulletin point list of the individual concerns, the original statement is written in a style that is a bit confusing for a non-native speaker such as me.
* Chrome's browser partners say JPEG XL adds monetary or hardware costs.
* Chrome's device partners say JPEG XL adds monetary or hardware costs.
* Does JPEG XL work best for the web?
* What is JPEG XL compression performance across a broad range of images?
* Is the decoder fast?
* Does it render small images fast?
* Is encoding fast?
* Hardware support keeping encoding costs reasonable for large users.
* Do we need it at all or just optimize existing formats to meet new use-cases?
* Do we need it at all or just optimize existing formats to meet new use-cases?
Jpegli is great. JPEG XL allows for 35 % more. It creates wealth of a few hundred billion in comparison to jpegli, in users' waiting times. So, it's a yes.
* Do other browsers and OSes support JPEG XL?
Possibly. iOS and Safari support. DNG supports. Windows and some androids don't support.
* Can it be done sufficiently well with WASM?
Wasm creates additional complexity, adds to load times, and possibly to computation times too.
Some more work is needed before all of Chrome's questions can be answered.
It's a real shame, because this is one of those few areas where Firefox could have lead the charge instead of following in Chrome's footsteps. I remember when they first added APNG support and it took Chrome years to catch up, but I guess those days are gone.
Oddly enough, Safari is the only major browser that currently supports it despite regularly falling behind on tons of other cutting-edge web standards.
I followed Mozilla/Firefox integration closely. I was able to observe enthusiasm from their junior to staff level engineers (linkedin-assisted analysis of the related bugs ;-). However, an engineering director stepped in and locked the discussions because they were in "no new information" stage, and their position has been neutral on JPEG XL, and the integration has not progressed from the nightly builds to the next stage.
Ten years ago Mozilla used to have the most prominent image and video compression effort called Daala. They posted inspiring blog posts about their experiments. Some of their work was integrated with Cisco's Thor and On2's/Chrome's VP8/9/10, leading to AV1 and AVIF. Today, I believe, Mozilla has focused away from this research and the ex-Daala researchers have found new roles.
Daala's and Thor's features were supposed to be integrated into AV1, but in the end, they wanted to finish AV1 as fast as possible, so very little that wasn't in VP10 made it into AV1. I guess it will be in AV2, though.
> ... very little that wasn't in VP10 made it into AV1.
I am not sure I would say that is true.
The entire entropy coder, used by every tool, came from Daala (with changes in collaboration with others to reduce hardware complexity), as did some major tools like Chroma from Luma and the Constrained Directional Enhancement Filter (a merger of Daala's deringing and Thor's CLPF). There were also plenty of other improvements from the Daala team, such as structural things like pulling the entropy coder and other inter-frame state from reference frames instead of abstract "slots" like VP9 (important in real-time contexts where you can lose frames and not know what slots they would have updated) or better spatial prediction and coding for segment indices (important for block-level quantizer adjustments for better visual tuning). And that does not even touch on all of the contributions from other AOM members (scalable coding, the entire high-level syntax...).
Were there other things I wish we could have gotten in? Absolutely. But "done" is a feature.
Some "didn't make it in" things that looked promising were the perceptual vector quantization[1], and a butterfly transform that Monty was working on, IIRC as an occasional spectator to the process.
Dropping PVQ was a hard choice. We did an initial integration into libaom, but due to substantial differences from the way that Daala was designed, the results were not outstanding [1]. Subsequent changes to the codebase made PVQ regress significantly from there, for reasons that were not entirely clear. When we sat down and detailed all of the work necessary for it to have a chance of being adopted, we concluded we would need to put the whole team on it for the entire remainder of the project. These were not straightforward engineering tasks, but open problems with no known solutions. Additional changes by other experiments getting adopted could have complicated the picture further. So we would have had to drop everything else, and the risk that something would not work out and PVQ would still not have gotten in was very high.
The primary benefit of PVQ is the side-information-free activity masking. That is the sort of thing that cannot be judged via PSNR and requires careful subjective testing with human viewers. Not something you want to be rushing at the last minute. After gauging the rest of AOM's enthusiasm for the work, we decided instead to improve the existing segmentation coding to make it easier for encoders to do visual tuning after standardization. That was a much simpler task with much less risk, and it was adopted relatively easily. I still think it was the right call.
All those requests to revert the removal are funny: you want Chrome to re-add jxl behind a feature flag? Doesn't seem very useful.
Also, all those Chrome offshoots (Edge, Brave, Opera, etc) could easily add and enable it to distinguish themselves from Chrome ("faster page load", "less network use") and don't. Makes me wonder what's going on...
> you want Chrome to re-add jxl behind a feature flag? Doesn't seem very useful.
Chrome has a neat feature where some flags can be enabled by websites, so that websites can choose to cooperate in testing. They never did this for JXL, but if they re-added JXL behind a flag, they could do so but with such testing enabled. Then they could get real data from websites actually using it, without committing to supporting it if it isn't useful.
> Also, all those Chrome offshoots (Edge, Brave, Opera, etc) could easily add and enable it to distinguish themselves from Chrome ("faster page load", "less network use") and don't. Makes me wonder what's going on...
Edge doesn't use Chrome's own codec support. It uses Windows's media framework. JXL is being added to it next year.
It can, that's why you didn't say "re-add jxl", but had to mention the flag, 're-add' has no flag implication, that pedantic attempt to constraint is somehing you've made up, that's not what people want, just read those linked issues
It has a flag implication because jpeg-xl never came without being hidden behind a flag. Nothing was taken away from ordinary users at any point in time.
And I suppose the Chrome folks have the telemetry to know how many people set that damn flag.
> “On display? I eventually had to go down to the cellar to find them.”
> “That’s the display department.”
> “With a flashlight.”
> “Ah, well, the lights had probably gone.”
> “So had the stairs.”
> “But look, you found the notice, didn’t you?”
> “Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.’”
Is asking for the old thing to be re-added, but without the flag that sabotaged it. It is the same as "you took that away from us, undo that!" Removing a flag does not turn it into a magical, mystical new thing that has to be built from scratch. This is silly. The entire point of having flags is to provide a testing platform for code that may one day have the flag removed.
Actual users, perhaps. Or maybe concern trolls paid by a patent holder who's trying to prepare the ground for a patent-based extortion scheme. Or maybe Jon Sneyers with an army of sock puppets. These "actual users" are just as real to me as Chrome's telemetry.
That said: these actual users didn't demonstrate any hacker spirit or interest in using JXL in situations where they could. Where's the wide-spread use of jxl.js (https://github.com/niutech/jxl.js) to demonstrate that there are actual users desperate for native codec support? (aside: jxl.js is based on Squoosh, which is a product of GoogleChromeLabs) If JXL is sooo important, surely people would use whatever workaround they can employ, no matter if that convinces the Chrome team or not, simply because they benefit from using it, no?
Instead all I see is people _not_ exercising their freedom and initiative to support that best-thing-since-slices-bread-apparently format but whining that Chrome is oh-so-dominant and forces their choices of codecs upon everybody else.
We have been active on wasm implementations of jpeg xl but it doesn't really work with progressive rendering, HDR canvas was still not supported, threadpools and simd had hickups etc. etc. Browser wasn't and still isn't ready for high quality codecs as modules. We are continually giving gentle guidance for these but in the heart our small team is an algorithm and data format research group, not a technology lobbyist organization — so we haven't yet been successful there.
In the current scenario jpeg xl users are most likely to emerge outside of the web, in professional and prosumer photography, and then we will have — unnecessarily — two different format worlds. Jpeg xl for photography processing and a variety of web formats, each with their problems.
I tried jxl.js, it was very finicky on iPad, out of memory errors [0] and blurry images [1]. In the end I switched to a proxy server, that reencoded jxl images into png.
Both issues seem to have known workarounds that could have been integrated to support JXL on iOS properly earlier than by waiting on Apple (who integrated JXL in Safari 17 apparently), so if anything that's a success story for "provide polyfills to support features without relying on the browser vendor."
I've always thought that one as flying under the radar. Most get stuck on WebP not offering tangible enough benefits (or even worse) over MozJPEG encoding, but WebP _lossless_ is absolutely fantastic for performance/speed! PNG or even OptiPNG is far worse. And very well supported online now, and leaving the horrible lossless AVIF in the dust too of course.