Hacker News new | past | comments | ask | show | jobs | submit login
John Carmack on JPEG (twitter.com/id_aa_carmack)
490 points by tosh on June 4, 2021 | hide | past | favorite | 214 comments



There's JPEG 2000, which is nothing like classic JPEG. It doesn't have the artifacts classic JPEG introduces at sharp edges, so you can read text. It has more color depth if you want it. It also doesn't mess up non-color data such as normal maps the way classic JPEG does.

JPEG 2000 is not used much. Decoding is slow, and encoding is slower. The decoders are either buggy or proprietary. It has way too many options internally. The big users of JPEG 2000 are medical. It has a "lossless" mode, and medical imagery is usually stored lossless because you really don't want compression artifacts in X-rays.

(I've been struggling with JPEG 2000 recently. Second Life and Open Simulator use it for asset storage. Some images won't decompress properly with OpenJPEG, a free decoder. I finally found out why. There's a field used for "personal health information" in JPEG 2000. This is where the patient name and such go in a CAT scan. That feature was added after the original version. Some older images apparently have junk in that field, which causes problems.)


OpenJPEG has improved the situation a bit but I use JPEG 2000 as a cautionary example. The community had some good selling points but assume that meant inevitability and slacked off on browser compatibility, open source, and even interoperability. For the first decade or so you had to pay something like $1,500 to get a copy of the spec, too.

I work in the library/archiving space where people spent years trying to make this happen due to the compression wins, support for a good range of color depths and spaces, and the progressive decoding is perfect for browsing galleries of high-res images before zooming way into the one you wanted.

The frictional cost largely canceled that out: people don’t trust unreliable formats and the JP2 files which only opened in one of {Kakadu, Aware, Adobe} and couldn’t be used with any open source tools live long in the memory. Performance in Jasper was wretched … and while the vendors thought that’d boost sales, it seemed far more effective to me at getting people to use other formats instead.

These days, if anyone is talking about new image formats the first question is what they’re doing for open source (especially things like ImageMagick which half the world uses) and specifically browsers. A WASM polyfill for <picture> or <img srcset> is really critical because it means people don’t have to transcode everything for a handful of users.


Proprietary stuff just isn't worth it. They erect so many barriers around the format and the specification that it's essentially impossible to get anything done.


This reminds me of Sony's minidisc technology. Such a cool product for its time but they purposely made it impossible to use and incompatible with everything.


Sony UMDs too!


That’s my general conclusion and it’s substantially more so for the library/archive case where long-term stability is so important. It’d be one thing if, say, a game found a format to be so much better when they control the encode and decode sides completely but once that’s not true it really needs a first-class open source implementation to even be considered.


> Second Life and Open Simulator use it for asset storage

JPEG2000 is such a cool choice for Second Life. The fact any truncation of a JPEG2000 bitstream is just a lower-resolution version of the image makes for convenient progressive enhancement when loading textures over the network, right?

(And Second Life even used JPEG2000 for geometry, sort-of. I guess with the advent of proper mesh support, that may be less common now though.)


JPEG itself has that property - you can decode any JPEG at 1/8 the size pretty easily, and progressive scan JPEGs do what you want here.

Dedicated texture compression formats would also do this since they use mipmaps, but I don't know if you can stream those.


For JPEG, if image is encoded using resolution progression, then decoder can not decode using quality progression (where image quality improves with each new piece of the image) and vice versa. With JPEG 2000, the decoder can decide which progression it wishes to use at decode time - by resolution, by quality, by component, or by spatial region.


Anyway, progressive jpeg2000 decoding is more sophisticated than progressive jpeg


This is super interesting. Do you have any more details on the use of JPEG2000 for geometry? I would be curious to hear more.


Sculpted prim. The object was basically a mesh of triangles in a fixed configuration, but the coordinates of the individual vertices were derived from an image (compressed using lossless JPEG2000, as any compression artifacts, not to mention chroma subsampling, would cause really ugly distortion). This was a halfway point between allowing arbitrary meshes and only pre-made object geometries (but with skinnable textures).

http://wiki.secondlife.com/wiki/Sculpted_prim


It also seems like support for JPEG 2000 can be hit or miss. I tried to open a .jp2 file I downloaded from the web into GIMP, but it looks all funky.

Original image: https://www.fnordware.com/j2k/jp2samples.html

What it looks like in GIMP: https://imgur.com/a/HCNz7ga


And iOS Safari shows it as a mostly green mess. After tapping the image to get to the real .j2, in the page itself it’s a png.


Wow, I just checked and you're right. I find it fascinating that all of this modern software continues to support JPEG 2000 in such a buggy state. Or maybe it's the file itself that's corrupted? I don't know enough about the file format to say.


the file is obviously wrong


The file is correct. You can open it in imagemagick (display) to check. Could be a simple problem with the colorspace implementation of gimp or something similar. If you would read the comments you would know what a mess jpeg2000 implementations are


Something appears to be wonky with the ICC profile embedded in this image. And while I have no idea whether the file itself complies with applicable standards, I can say that ImageMagick is not a reliable test in this case. To wit:

1. Per Photoshop, "The embedded ICC profile cannot be used because the ICC profile is invalid."

2. After using the OS X Preview.app "Assign Profile" command to replace the embedded ICC profile with the system-supplied sRGB profile, the image displays correctly in Safari.

3. The ImageMagick display command is apparently ignoring the embedded profile and assuming sRGB, as it continues to display the image correctly even when I replace the embedded profile with an obviously incorrect (but valid and correctly interpreted by Photoshop and Safari) profile.


The same with ImageGlass, my default image viewer software in Windows


Firefox (Ubuntu) wants to download it instead of display it...


JPEG 2000 is not used much

There's one big win though. Digital cinema projectors, in now pretty much all theatres, run DCPs which have movies encoded with it.


We used it to compress scanned PDFs, using mixed raster techniques. Since the PDF natively supports JPEG2000 we used it for document background compression and we used JBIG2 for text image compression. The compression ratio was impressive but the document rendering was noticeably slower even with Adobe reader …


>background compression and we used JBIG2 for text image compression

Are you aware of the downsides?

https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

It stands to reason that not only Xerox had problems with JBIG2.


Thank you for the link, yes we are aware of this particular issue. For our engine we used only lossless mode and using this mode the decompressed binary image is identical to the original so the described problem cannot occur. Nonetheless this limit severely the usefulness of JBIG2.


Adding to the list: ESA's Sentinel 2 products (granules/tiles) in are typically distributed in JPEG2000 format. Anyone working with this imagery has probably experienced all the trouble associated with extracting and transforming it into a more useful format.


NASA's Solar Dynamics Observatory Atmospheric Imaging Assembly telescopes are available from various NASA and ESA servers as JPEG2000. The java software JHelioviewer (http://www.jhelioviewer.org/) displays these 4k*4K JPEG2000 files with opengl acceleration at high frame rates and it is beautiful.


macOS and iOS have always supported JPEG2000.


Red Camera ripped off JPEG2000 to create its codec; now they extort every other camera vendor that wants to do compressed raw, thanks to the bullshit patent they were able to acquire on it.


JPEG 2000 is ancient. It is nearly as old as JPEG itself, it long since missed it's change to become relevant. Today it is a legacy format and should be mostly ignored. It has its own types of artefacts, which in some cases are even worse for image quality than those of JPEG.

JPEG has introduced multiple new formats since JPEG 2000, all of which are better, such as JPEG XR and JPEG XL.


JPEG was designed in 1992 JPEG2000 was designed 1997-2000


The year right now is 2021. So yes, JPEG 2000 is almost as old as JPEG.


What's the advantage of JPEG2000 lossless over, say, PNG? Archives will probably be compress later on anyway and using a widely-supported and known format seems far more suited for archival purposes.


JPEG2000 can be more of a data container than an image format. While the downsides are many, there are a few upsides to JPEG2000. Greater bit depth is useful when storing DNs. More than three channels is useful when storing multi-spectral images. Internal tiling is useful when range reading specific parts of an image (obvs more useful when working with very large images). These specifically make JPEG2000 more useful than PNG when working in certain domains. Satellite imagery is one such domain.


Why not just TIFF for container?


DNs?


Digital Negatives


Digital number, the output of an analog to digital converter.


I thought Digital Negative.


I can see how it would be useful for digital negatives too, but I don't know much about those.


It's a fancy word for raw files.


Thanks!


> Archives will probably be compress later on anyway

What do you mean by this? Zip, 7zip, gzip, zstd, etc. get little to no compression on PNG. JPEG, or JPEG2000. Presumably you mean something other than zipping up directories of images.


Zip compressing a single PNG is kinda redundant, since it already uses the same algorithms as part of its own lossless compression.


Yes. I'm aware that zip, gzip, and PNG all use zlib DEFLATE compression.


The issue you are facing with OpenJPEG has nothing to do with PHR, (which isn't actually "personal health information"), but the fact that the library doesn't decode truncated images. The standard is designed for best-effort decoding when there are corrupt or missing packets, but the implementation errors out when it detects this.


Other big users of JPEG 2000 include libraries and archives storing e.g. historic map images.


Ew, and some years ago I almost but not quite entirely rewrote pnglite exactly because med data has to be lossless, mostly grayscale, in tens of megabytes, and JPEG2000 doesn't even approach to cut it. Something like 10-20Mpix lossless grayscale, well, actually approaching good old xray photos.

What possesed someone to actually attempt to use JPEG2000 I do not understand.


Technically, PDFs support JPEG 2000 though this isn't widely used.


The eBooks on archive.org are probably the most well-known example. They take noticeably longer to display too, which was what lead me to discover that fact.


Carmack's game Rage used JPEG XR I believe.


>JPEG 2000 is not used much

Apple has decided it shouldn't be by not allowing its use on iOS.


Someone on the internet says Apple iOS does support it: https://davidwalsh.name/how-to-use-jpeg-2000-jp2-for-a-faste...


Fun note: Red Camera ripped off JPEG2000 and of course got a patent on it through the USA's derelict patent office. Now they're extorting any company that wants to do compressed raw recording.


"you really don't want compression artifacts in X-rays"

You really don't want compression artifacts in any image, which is why you should set the compression ratio wisely when encoding. I don't see how X-rays are any different.


Compression artifacts in X-rays can easily kill people, either by requiring additional unnecessary X-rays (which both delay diagnosis and cause cancer) or by causing erroneous diagnoses; by comparison, the cost of the data storage thus saved is trivial. Compression artifacts in filtered photos of your cute pet turtle for Instagram are much less likely to kill people.


And this becomes increasingly true as compression methods get increasingly clever.

https://www.theregister.com/2013/08/06/xerox_copier_flaw_mea...


Ah yes, the JBIG2 fiasco!

JBIG2 is a format for storing black and white documents in a highly compressed way. It works by detecting each letter in the document, and then replacing it with a pointer to the reference version of that letter, up to a certain threshold. Basically compression via OCR.

Of course, this means that when a distorted letter is too close to the reference version of another letter, it will get replaced with a clean version of that incorrect one. So even though a human could easily recognize that something was off with that letter in the original image, the JBIG2-compressed image has no such clue!

What’s really bad is that JBIG2 compression was built into certain Xerox machines that were used by archivists to digitize important documents for years until someone noticed the discrepancies. JBIG2 was promptly banned for archival purposes, but there might still be a ton of documents with these kind of invisible errors in our archives! :-)


It would be so cool to add the OCR as metadata. Texts in internet images could be readily selected and available to assistive technologies if images were OCRd at creation time.


PDF supports this use case by adding an invisible text layer on top of the raster content.

On the other hand, JBIG2 doesn’t actually do OCR. It only does template matching of similar-looking blocks of pixels. The compressor doesn’t try to understand which letter those pixels represent.


But isnt medical images interpreted by eye only, so artifacts of compression not visible to the eye, those should not be a problem possibly?


Artifacts can be visible. Also they can be destructive.


A lot of algorithms are applied to medical images, as pre-processing for eye examination but also for automated analysis.


IIRC there was a study which indicated oncologists' ability to detect tumors in X-ray images was degraded even with lossy compression ratios which didn't introduce obvious artifacts. I couldn't find it with a quick search though.


Sure, but you don't normally end up getting an unnecessary biopsy with most other image artifacts.


You don't want to do anything lossy in any medical product, otherwise you'd have to prove in the certification process that your lossy compression doesn't introduce any risks.


For storing yes but lossy compression can be useful to improve performance of your UI for example, as long as the user knows that the image displayed has been degraded.


some images manage to communicate in spite of high compression. X-rays are an example of an image where the cost of misinterpretation is very high.


The XVideo extension, commonly used by MPlayer (at least in the past, maybe still?), has supported sending YUV images to your video card for 20½ years, since XFree86 4.0.2: https://www.cs.ait.ac.th/~on/mplayer/en/xv.html

There are various problems in XVideo that make it hard to use for general application display, but the fact that this was a thing they included in it makes me think that a substantial number of video cards have included YUV decoding capability in hardware since the previous millennium.


Since I just learned the hard way that "XVideo" can produce unrelated results if you put it into a search engine, here's a direct Wikipedia link for those who want to know more about it: https://en.wikipedia.org/wiki/X_video_extension


Oops. Sorry. I didn't think of that. My search was [yuv xvideo driver].


YUV has been the standard video Color encoding on TV. It is in fact a backwards compatible extension to black and white as the wikipedia page puts it: https://en.wikipedia.org/wiki/YUV

while not standard, writing a shader to decode is not hard(admittedly if you know shaders)

The time I spent dealing with video decode/encode was not pleasurable but interesting.

The really hard part is the actual encoding/decoding algorhtims


> while not standard, writing a shader to decode is not hard(admittedly if you know shaders)

Ah, the old "Step 2: Draw the rest of the owl" scenario.


We'll leave that as an exercise for the reader


YUV was the standard in PAL, NTSC used something called YIQ.


Is that why the industry joke was the NTSC acronym meant “Never The Same Color Twice”?


I don't think so. I think NTSC's color issues (compared to PAL) were due more to how PAL alternates the phase of the lines (PAL = phase alternating lines, btw). The alternating phases or PAL had the effect of canceling out transmission errors (in the colors), providing more stable color than NTSC. This is somewhat similar to how balanced pairs in ethernet cables cancel out transmission errors. (see also: balanced lines in long pro audio cables or differential pairs in PCB routing)


It's similar, but note that what differential-voltage signaling cancels (as in pro audio cables, PCB routing, or ethernet cables) is common-mode EMI; but, as I understand it, the differential-phase signaling in PAL instead cancels out errors introduced by dispersion (where the phase delay varies significantly over the 6MHz bandwidth of a TV channel).

So it's analogous, but there's a critically important difference. (Not that you claimed otherwise.)


I'd add the relative lack of calibration on NTSC as another huge factor. When you have TV manufacturers putting out sets with completely different defaults on top of broadcasters not being able to calibrate to any sort of standard TV appearance, you'd see a lot of different results per-station AND per-show.


I may be conjuring things, but I think many early video cards actually outputted a TV or component video signal, which uses YUV (=YCbCr?) color, so it is RGB that would have required conversion, not YUV.


Video codecs generally use one or another type of YUV format so lots of display hardware can already use it, on something like a mobile SoC or settop box no one can afford the memory bandwidth to do the conversion anyhow.

Carmack is giving an extremely simplified version of the world as it exists today. Your hardware can already use YUV formats, maybe it is even outputting YUV because you chose a resolution and bit depth that forced it to for bandwidth or clock constraints. Complex apps like browsers that do their own compositing can already choose to use YCbCr formats when supported for textures.


Overlay support tends to be very limited - it's effectively blitting to the final framebuffer after compositing, which means no transparency and a limited number of surfaces. The desire is for the compositor or any other shaders to be able to read YCbCr surfaces natively, converting to RGB as a part of the sample.

Namely, VK_KHR_sampler_ycbcr_conversion. I'm actually not sure if other APIs provide the equivalent yet, the HW capability is pretty recent...


Android has had ColorMatrixColorFilter[1] since the beginning and YUV -> RGB conversion is actually an example use case they specify. Problem is, however, that many YUV formats do chroma subsampling, and you can't exactly put that into a regular bitmap without shuffling a lot of bytes around, which kinda defeats the purpose.

[1] https://developer.android.com/reference/android/graphics/Col...


There’s GL_MESA_ycbcr_texture for vanilla GL and GL_EXT_YUV_target for GLES, but I’ve no idea how well-supported these are. (Notably, the Mesa extension is ancient—the changelog says it’s been published in 2003.)


IIRC GL_MESA_ycbcr_texture is basically GL_APPLE_ycbcr_422, only supports packed 422, and was primarily intended for letting a compositor pass a 2vuy/yuvs surface to the overlay HW without actually converting it in GL.

GL_EXT_YUV_target is close, but it doesn't transparently include the specific step of YUV -> RGB conversion, just the sampling multiple planes + upsampling for chroma (and convenience functions for shader conversion.) (I think? actually not certain, since I can't find definition of how chroma is upsampled)

Well, I guess Vulkan's isn't completely transparent, since the conversion sampler has additional restrictions as to how it can be used. But at least with it shaders can be written to not care whether the source is RGB or not.


Overlays and GL/Vulkan extensions aren't strictly necessary, though support ultimately depends on the driver. For example, glamor does the conversion for some formats in a shader: https://gitlab.freedesktop.org/xorg/xserver/-/blob/8274dd664...


Yeah, those are some of the problems I was thinking of.


> makes me think that a substantial number of video cards have included YUV decoding capability in hardware since the previous millennium.

You are correct, and you can find details on that by reading the card documentation (as far as it is available). That was in particular the case when cards had a fixed-function rendering pipeline.

With the advent of more general purpose cores on the GPU for the rendering pipeline, YUV decoding has been offloaded to them.


And emulators like Mame too I think.


Yeah, you need YUV to emulate NTSC or PAL.


You need it for a CRT shader, but games pretty much universally were in RGB internally and got converted on the way out. Especially for MAME, which is emulating games that would use RGB monitors.


I think MAME emulates some hardware for which NTSC artifact colors were important:

https://wiki.mamedev.org/index.php/Driver:Apple_II (which had no RGB at all)

https://old.reddit.com/r/OpenEmu/comments/hhxnry/sharing_my_... (showing Sonic, among others, though I'm not sure if this is really about NTSC artifacting or just dithering)

http://forum.arcadecontrols.com/index.php/topic,152465.msg15...

https://emulation.gametechwiki.com/index.php/NTSC_Filters

So, there are some exceptions, including a few extremely popular platforms, but certainly you are correct that emulating the majority of game platforms does not need NTSC artifacting.


For arcade, yes, but MAME does do consoles as well in both NTSC and PAL formats. MAME has a pretty high-quality MegaDrive/Genesis driver, for instance.


Genesis was RGB internally and supports RGB out, like I think every console generation after the Atari. PAL conversions were just about framerate.

https://www.retrorgb.com/genesis.html


Does Wayland support this?


One of the challenges with Y'CbCr (what Carmack is calling "YUV") is that there are so many flavors.

He mentions 4:2:0 chroma subsampling. But he doesn't mention chroma siting. Or alternative sumbsampling schemes. Or matrix coefficients. Or full-range vs video-range (a.k.a. JPEG vs MPEG range). Heck, how you even arrange subsampled data varies by system (many libraries like planar; Apple likes bi-planar; etc.).

I'd love to see more support for rendering subsampled Y'CbCr formats so you don't have to use so much RAM, but it gets complicated quick.


I don't think he's asking for video cards to natively handle JPEG images. ISTM what he is advocating for is keeping the JPEG decompressed in memory in a YUV format rather than RGB. The savings come from the fact that the UV parts are downsampled, not from any sort of compression.

So since you already have to process the image, it doesn't seem like a big ask to convert from JPEG-flavored YUV to GPU-flavored YUV. But I'm not an expert, so maybe this is hard/lossy?


I'm fully aware that he's not talking about decoding JPEG images, and instead is talking about keeping Y'CbCr formats in memory and rendering with Y'CbCr.

It's not lossy. But Y'CbCr -> RGB is not as simple as you might think. My whole post was about Y'CbCr -> RGB conversion. It's doable; I'm not saying it's impossible. There's just several various flavors of Y'CbCr and correctly handling all of them (or at least the majority/most common) gets tedious.


I think there's still some confusion.

Why does the GPU need to handle more than one flavor of Y'CbCr? Why can't the jpeg/PNG/whatever decoder be relied on to convert whatever flavor it uses to the Direct X/OpenGL flavors?


Is this a real problem? Surely something in the current pipeline understands what YUV flavor JPEG uses.


JPEG can use multiple: you can select the chroma-subsampling (with comp_info[0].h_samp_factor), and do 420, 422 or 444...


And the pipeline is able to convert it to RGB fine, so isn't the information available to use? Why would it be a challenge passing along that information?


The information is there (sometimes... it's amazing how many times you have to make a "reasonable" guess when decoding images/video because metadata is missing). Passing it along is feasible. But it does complicate things (especially the implementation) pretty quickly.

A lot of Y'CbCr -> RGB converters actually disagree with each other. They're all close enough that casual users don't notice or care about the small discrepancies.


I believe there was a HN post about greens being different in different browsers just recently.


https://news.ycombinator.com/item?id=27293266

Again about what to do when missing or conflicted metadata is available. :)


OBS uses NV12 variant of YUV by default for its rendering pipeline, which is supported as a GPU pixel format by Direct3D. OBS has written various shaders to convert into and out of NV12 on GPU.


The submission is a link to a twitter thread. A tweet can have up to 280 characters. How many footnotes would you prefer John to have used?


Woah, easy there. I have no qualms with what John said. I don't expect someone to delve into all the complexities in Twitter. Overall I agree with John and I wish we had more ways to natively support Y'CbCr.

The point of my post was to help casual readers know that this gets complicated fast. Someone might think defining FMT_JPEG_YUV is easier and simpler than it actually is. I don't fault John for that. It's not a fault of anyone, really.


VK_KHR_sampler_ycbcr_conversion is defined to support the practically used matrix coefficients, range, siting, and has both 2 and 3 plane formats.


Oh neat, I didn't know about that one! I've never used Vulkan but looking at the API it does indeed cover the most common things. I wish we had more rendering APIs with that kind of support; I'd love to be able to use it.


I would prefer him not to use Twitter at all, and blog instead.


However, the attractiveness of privatizing network effects and the simplicity of doing so pretty much killed any hope of blogs being a long term pragmatic solution for self publishing. Time to move on or at least give up until the underlying driving forces are changed.


I feel daft, but:

  The vast majority of images are jpegs, which are internally 420 YUV, but they get converted to 32 bit RGB for use in apps. Using  native YUV formats would save half the memory and rendering bandwidth, speed loading, and provide a tiny quality improvement.
What does he mean by using native YUV formats? Something (I wave my hand) in the rendering pipeline from the JPEG in memory to pixels on the screen?


> What does he mean by using native YUV formats?

Your display uses 3 bytes per pixel. 8 bits for each of the R, G, and B channels. This is known as RGB888. (Ignoring the A or alpha transparency channel for now).

YUV420 uses chroma subsampling, which means the color information is stored at a lower resolution than the brightness information. Groups of 4 pixels will have the same color, but each pixel can have a different brightness. Our eyes are more sensitive to brightness changes than color changes, so this is usually unnoticeable.

This is very advantageous for compression because YUV420 requires 6 bytes per 4 pixels, or 1.5 bytes per pixel, because groups of pixels share a single color value. That's half as many bytes as RGB888.

When you decompress a JPEG, you first get a YUV420 output. Converting from YUV420 to RGB888 doesn't add any information, but it doubles the number of bits used to represent the image because it stores the color value for every individual pixel instead of groups of pixels. This is easier to manipulate in software, but it takes twice as much memory to store and twice as much bandwidth to move around relative to YUV420.

The idea is that if your application can work with YUV420 through the render pipeline and then let a GPU shader do the final conversion to RGB888 within the GPU, you cut your memory and bandwidth requirements in half at the expense of additional code complexity.

Wikipedia is a good source of diagrams and details that explain this further: https://en.wikipedia.org/wiki/YUV#Y%E2%80%B2UV420p_(and_Y%E2...


In Commandos 2 we stored the huge map images in YCbCr 420 in memory, and converted to RGB during rendering via lookup tables rather than shaders (this was back in 1999, all CPU!). We compressed the range of each component below 8 bits, running a histogram so our bits represented values inside the range actually present in the original image. We got it down to 6 bits per pixel or something close to that.

Converting via lookup tables, one for each component, made it very cheap to perform palette-style tricks like tonemapping and smooth color shifts from day to night.

[Edit: it may have actually been CIE Lab, not YUV/YCbCr, because the a&b tended to have narrower ranges. It's been too long!]


The maps in Commandos 2 were beautiful and distinctive. I spent many hours playing through those levels.


If they were hard to play, I can assure you they were MUCH harder to create. I look back and find it hard to believe how much blood and tears the artists and level designers had to sweat. All the kudos go to them, but I admit I felt very happy and proud when I finally found the right tech to make them justice at runtime.


Do tell more! I would be curious of any details on the development of commandos2, I used to play that game a lot.


That doesn't really make sense to me, even in pipelines that deal only with YUV (very common in professional video) you always upsample to 4:4:4 in a linear colorspace and you convert back to 4:2:0 or 4:2:2 at end.

How would you do even trivial things like color blending in 4:2:0?


The implication is that you would keep the data in its original form (or 420) only until you have to perform an operation to output a new image/rendering. You could then perform the operation (such as shading operations) in a larger color space on a graphics card, and finally afterward output again, perhaps as compressed YUV or perhaps as RGB over the wire in 8-, 10- or 12-bit (as you get to fancier HDR10+ and/or Dolby Vision). (Note in this post I'm using YUV instead of the more technically correct YCbCr...)

That said, this (storing JPEGs in YUV420) is just an optimization and the more images and displays go HDR, the less frequently we'll see YUV JPEGs, though we could see dithered versions, maybe, in 444. That's basically the same thing, but once you discard 420 and 422 you might as well use standard 8-bit RGB and skip the complexity of YCbCr altogether. If you're curious about HDR as I was, though 10-bit is "required", you can dither HDR to 8-bit and not notice the difference unless doing actual colour grading (where you need the extra detail, of course). For obvious reasons, I've never heard of anyone dithering HDR to YUV 420 though, and most computer screens look pretty terrible in when output to a TV as YUV422 or YUV420.


Real-time games are bandwidth limited. 100% increase is a real cost. Maybe worth some awkwardness in the engine.


Wow, you explained that incredibly well, thanks! I knew nothing about what Carmack was talking about here, but your explanation makes it pretty clear.


In what way is RGB easier to manipulate? I get that it's the format we have to send to the monitor. Also, current tools may be geared more towards it. But on the face of it, I don't see why it is simpler. Do you have any good examples?

I can think of several image processing tasks which are more straightforward in a luma/chroma format. Maybe it's because I'm more used to working with the data in that form?


>In what way is RGB easier to manipulate?

Probably because it's the native input format for every display technology in existence? If you're going to twiddle an image, you might want to do it in the format that it's displayed in.


Mostly the issue is the layout, not the color space. YUV420 is awkward to handle because it's "420", not because it's YUV. There wouldn't be too many issues if everything ran on the 1-byte-per-pixel-component YUV444, besides converting to/from RGB for certain processing steps.


It's quite common to upsample to 444 for processing, but using 420 for storage.


This was such a good explanation, thank you for taking the time.

I think a twitter account called @JohnCarmackELI5 that just explains all of Johns tweets like I have no idea what he's talking about would be invaluable. The man is obviously bursting with good/interesting stuff to say, but I grok it like 5% of the time.


Thanks, this is a very clear explanation.


That was an excellent explanation, thank you.


Thank you for the clear explanation. I know little about graphics, but this was easy to follow.


I would imagine a shader that converts YUV to ARGB at the time of rendering as opposed to storing it all the way along the pipeline as 32 bit integers.

It's a bit tricky because rendering pipelines composite the final image through many layers of offscreen compositing before the pixel hits the screen.

The core issue is that the offscreen composited layers would still be 32bit textures which is a bigger issue. I would imagine a Skia-based draw list to encode this through the pipeline which could help preserve this perhaps.


Yup. You typically first decode the JPEG into a raw image format in GPU memory & then that gets transferred to scanout HW that actually sends it to the display (e.g. via HDMI). The argument feels a bit weak though. At that point why not just use JPEG through your entire pipeline & save even more bandwidth. Of course you have to decompress if you’re modifying the image in any way so it doesn’t help there.


You can't texel-fetch directly from JPEG data, it has to be decompressed first.


Yes of course. GPUs already support compressed textures though that get decompressed on the fly. Of course JPEG has a lot of baggage that actually makes it a poor format for this but perhaps newer ones might be useful. What you lose in “optimality” (compression schemes designed for GPUs are better for games), you win in “generality” (ability to use this outside games to, for example, improve web browsing battery life).


There's a very hard distinction between GPU texture compression formats and general image compression formats - the former need to support random access lookup in constant time. Anything meeting that criterion is not going to be generally competitive; it's like the difference between UTF-32 strings and UTF-8 strings.


You could do the YUV to RGB conversion operation in your pixel shaders in the GPU. That way you save a bit of bandwidth compared to uncompressed RGB.

It's been done. There are even GPUs that support this operation natively, so there's no additional overhead.


The future of images is JPEG XL - https://jpeg.org/jpegxl/

A great overview and explanation: https://cloudinary.com/blog/time_for_next_gen_codecs_to_deth...


I don't know, JPEG has the "problem" of being good enough for what it is used for. We could do better for decades but because everyone understands JPEG it is still the standard. And if JPEG doesn't fit, chances are that PNG does.

There are alternatives. Google optionally uses webp, because it saves them bandwidth and they control the entire chain. And specialized applications like maps sometimes use better suited formats, but if you want to share a picture, that's JPEG.

The same thing happens with audio. The go to format is still MP3 even though it is well outdated. In fact, we had what is close to the perfect lossy audio codec (Opus) since 2012 and support it is pretty much only used when you control the entire chain, like in video games.

So maybe JPEG XL is the perfect image format, but it is still a tough sell when you have something universal that is good enough against it.

The only way I can see it gain traction is if tech giants get together and force it on us. Like they have been doing for web standards. And if there are patents, even that may not work.

Video is different because it uses a huge amount of bandwidth and no formats are "good enough" yet.


JPEG XL's party trick is that you can losslessly (and reversibly) transcode regular JPEG images to XL images, at less CPU cost than fully decoding and then encoding, for a size savings of ~30%. I think this will be enough to put it over the top and drive adoption as CDNs will be able to transparently serve either JPEG or XL for a resource originally uploaded as a standard JPEG depending on whether the requesting user agent supports XL.


It is, but that's not relevant to what Carmack was talking about.


compared to JPEG2000 it feels like it is actually trying to offer an upgrade path


I mean the fact that JPEG has won out for so long suggests that it's anyone's guess what will replace it.


The great thing about XR is that ordinary JPEG images can be compressed with XR and receive a reduction in size without any reduction in visual quality. Once all your tools receive support for XR, you can convert your entire image library in one fell swoop with no fear.

If that isn't an upgrade path, I don't know what is.


You mean xl - xr is also a thing, but itself quite old by now, and not really likely to break through any more.


Ack, I do indeed. I blame the JPEG org for having such confusing names!


IIRC this optimization has already been used by Opera Mobile and Edge in the past: upload YUV to the GPU RAM, and then convert pixels to RGB on the fly when displaying. Unfortunately, I can't find the links to their respective blog posts (<shakes fist at Google substituting all keywrods with their synonyms and searching recent pages only>).

However, chroma subsampling is a very primitive form of 2x2 block compression (NB: not related to JPEG's DCT block size). These days GPUs support much better compression natively, with 4x4 blocks, and much fancier modes (ETC1, ASTC). With clever encoding of such textures, it's even possible to achieve compression ratio comparable with JPEG's, while having a straightforward way to convert the compressed file to the compressed texture format.


> shakes fist at Google substituting all keywrods with their synonyms and searching recent pages only

You can set a time range in Google; "Tools" link under the search bar. DuckDuckGo has this as well. Super useful, especially if there's some recent/prominent news and you want to find things other than this news, or if you're looking specifically for older stuff.

I think you can use "word" (with quotes) to prevent the synonym thing; not entirely sure about that.


Just searched `opera browser "jpg" "yuv"`, no date filtering. Second result is:

https://timkadlec.com/remembers/2018-03-22-compressive-image...

Seems to be it?


I use the Tools > Time Range feature constantly. It's extremely useful.

I don't know if quotes prevent synonyms, but verbatim mode should.



This would save memory but add compute cost if the image is drawn to the framebuffer or another sRGB buffer more than once or twice. It wouldn't necessarily be a win to make this behavior the default everywhere.

In the case of web browsers it depends how the image is used. An <img> with a large JPEG is probably drawn only once during tile rasterization, and browsers could certainly use the memory savings, so it would probably be a win. But if you had a small JPEG used as a page background and tiled over the whole screen, the memory savings would be small and you'd be wasting power converting the same pixels from YUV to sRGB over and over, so that would likely be a loss.


That is not necessarily true. In some cases, the compositor maybe configured to sample the input buffer directly from YUV420, applying the transformation on scanout and thereby saving memory bandwidth to read from the framebuffer. This makes a tremendous amount of sense when the source is a video, but much less sense when the source may be rendered vector graphics, which generally look pretty bad with subsampled chroma.


Sure, if you can use a YUV framebuffer then that can save memory bandwidth during scanout (though the conversion to RGB still happens because the screen is not YUV). But that doesn't apply in the tiled page background case I mentioned, as web page contents are composited in RGB.


In some cases compute trade-off is still cheaper than memory latency.


If the goal is to save memory, you can go farther than this. You could store the JPEG in RAM, compressed, and have the GPU decompress & shade. Each of the DCT blocks can be converted massively parallely on GPU.


You can't do image decompression in GPGPU because the last step (Huffman coding) is not parallelizable - in fact if any compression is parallelizable it's not compressed enough. But you can do it in dedicated hardware.


Ok, maybe not the arithmetic code (or huffman), but I was thinking you'd at least invert the DCT on GPU.


Maybe I'm too stupid to understand this but AFAIK YUV isn't exactly linear so you still need to convert to linear space (trichromatic linear color space like RGB/XYZ)... no?


Indeed but that's also true for RGB (which is typically sRGB on computers) and you can have linear YUV if you want. "RGB" and "YUV" can really mean a whole bunch of things. There are many RGBs and probably twice as many YUV due to the hell that's video standards.

I'm not sure I understand where Carmack is coming from here though (am I missing some context? I don't use twitter and these threads are always a huge pain for me to follow especially since Carmack doesn't even bother breaking on full sentences). I don't get how processing in YUV instead of RGB has anything to do with 10bit components for instance.

Also, in my experience most video software deals with YUV natively and only converts as needed. It's probably different in the gaming and image processing world but that's because everything else is RGB and it seems to be a big ask to just tell everybody to convert to YUV.

Besides if quality is of the essence, you will typically store more that 10 bits for internal processing, probably 16 and maybe even floats if you want to have as much range as possible.

I dunno, I won't pretend that I'm smarter than Carmack, but I wish there was a bit more context because it's a bit opaque for me at the moment.


> I don't use twitter and these threads are always a huge pain for me to follow especially since Carmack doesn't even bother breaking on full sentences

This site (threadreaderapp.com) may be of interest to you. It aggregates threads into a readable column as if it were a single article, here's Carmack's "thread": https://threadreaderapp.com/thread/1400930510671601666.html

Extremely useful for dialogues/conversations on twitter.



YUV is a linear transformation of RGB to the best of my knowledge. Wikipedia seems to agree.


JPEG uses Y'CbCr, not YUV per se. The matrix transformation is applied to gamma-encoded R'G'B' (typically sRGB), not linear RGB.

http://poynton.ca/PDFs/YUV_and_luminance_harmful.pdf


The only important bit of pedantry over analog/digital and prime is reminding people that no one deals with linear light unless you already know you're doing so, and that blending in nonlinear space is rarely correct, no matter how common it is to do so.


I think we are in agreement? What you are saying is that conversion from linear YUV to linear RGB is, indeed, linear.

Further, the transformation from Y'CrCb to gamma-encoded R'G'B' is also a linear operation. Right?


It depends what you want to do with the data. If you want to send your sRGB-profile Y'CbCr JPEG image directly to an sRGB display then you are done after applying a matrix.

If you want to composite multiple images, do other intermediate processing, or display the image on an arbitrary non-sRGB display, you probably want to convert to a linear space along the way.


So we are in full agreement, then.


Does anyone know what fraction of jpegs in the real world are 4:2:0?

For photos I use whatever the camera likes, but for everything else I use 4:4:4.


It's good to remember that image sensor pixels only have one colour anyway (they are each either G, R or B). 67% of colour information is made up right from the start before your even encode anything. Another reason why 4:2:0 sampling may not be as bad as it sounds.


For cameras outputting at their max megapixel rating, yes.

If you're outputting at a lower resolution or not using a camera then it can be a notable loss of quality.

Most web images are probably scaled down, and in other contexts I bet it's similar. If you're looking at a raw camera shot you're usually only dealing with one at a time, so while simplicity is nice the RAM impact of those will be limited.


If you're looking to save on storage/bandwidth (hence not using the max resolution of your camera) it can absolutely be a good idea to spend more bits on Y than on C.


Yeah, High quality exports (definitely from Lightroom) are normally YCrCb444. You don't always need it, but often you do if you really care about fidelity.

I know several imaging apps have the ability to select which subsampling type to use as well.


Going by the terrible colour bleed I see regularly I'd say quite a few of them, somewhere along their lifetimes.

Seriously though chroma subsampling is not kind on any kind of red shape, it's especially bad against white backgrounds.


That's because JPEG decoders use poor upscaling algorithms. I think libjpeg uses nearest neighbor.

It's pretty easy to write a better one since the full resolution image is already available for the Y plane, you can do super resolution.


Well easy is relative, but yeah you can do quite a lot better than the basic bilinear or bicubic upsampling that seems to be common. However most common image tools (and video players) don't seem to put in the effort, with very few exceptions.


ImageMagick seems to encode JPEGs as YUV444 for quality > 90, and I saw at least one other JPEG encoder use the same quality threshold.


> Does anyone know what fraction of jpegs in the real world are 4:2:0?

Most of them. Pretty much the default subsampling for libjpeg(-turbo).


I sampled my jpegs randomly. My image downloads seem to be all 420, and the photos 422.


Any jpeg on social media is going to be chroma subsampled when the site recompresses it.


Twitter is a bad blogging platform.


I gave up after the first section after mashing some bits of the 'UI' to try to read the next bit.. simply posting a link could provide a far better reading experience imo.


I am fascinated by his use of backslash-escaping the tweet EOD to signify that it's being followed by another one.


I suppose it takes less characters than marking the ordinal of the tweet and probably familiar to Carmack's technically-minded audience as escaping the return character in a command line (to add a new line to the command instead of running it).


It is also how you continue a line in a preprocessor macro.


> The vast majority of images are jpegs, which are internally 420 YUV, but they get converted to 32 bit RGB for use in apps. Using native YUV formats would save

...

> You can do it today, but you need to do the color conversion manually in a shader, which can be a big ask for some devs

Where the perf. really matters, and shaders are involved, doesn't everyone use texture compression formats like ETC1 and PVRTC? https://developer.android.com/guide/playcore/asset-delivery/...


They are used widely in games. But, not so much elsewhere. Non-game-app designers get upset by the artifacts. I've shipped games using them for UI. We were hapy to tolerate the artifacts in exchange for the 8X memory usage & memory bandwidth reduction (RGBA8888 vs. PVTRC4).

However, I've heard that Netflix was one of the first sponsors of https://github.com/BinomialLLC/basis_universal They are shipping innumerable images to a huge variety of underpowered set top boxes. Totally worth investing in a giant-leap memory optimization.


ETC1/PVRTC are lossy. If your source material is 4:2:0 then it's best to just stay with it. If you convert to RGB and then compress you don't really gain anything except a slight reduction in quality.


> and shaders are involved, doesn't everyone use texture compression formats like ETC1 and PVRTC?

Yes, when they can pay the computational price for compression beforehand. Not going to work, if you had to compress in realtime.


Interesting approach, but what about more modern alternatives like webp and avif? I know that jpeg has its advantages but the current the lack of support is a problem, that time will solve.


IIRC all video codecs (WebP and AVIF are simply intra frames from VP8 and AV1 respectively) work in the YUV color space because the color information is much lower frequency than brightness, and thus YUV is more easily compressible than RGB. Chroma subsampling (making each U and V correspond to a 2x2 block of Y) works for the same reason.


This sounds exactly like Cloudflare could add this as a 1-click feature to save 50% bandwidth.


> You can do [direct use of YUV] today, but you need to do the color conversion manually in a shader

Many mobile GPUs support an extension that does YUV conversion for you (GL_EXT_YUV_target). Maybe it's of less interest to desktop GPU vendors?


I think the problem is moving the dev flow from layered bitmaps to a shaded flow.


I thought he was working on AGI


I think a lot of the input data to his models are images.


That's only some of the threads.


Not only is he working on AGI, but he has some of the most interesting and unique ideas about how to achieve it.

Everyone is currently approaching AGI in what you might call a traditional way. Carmack's way is completely different, which was refreshing.

I think Carmack's technique has the highest chance of reaching AGI. The tweet chain isn't as unrelated as it seems.


Do you have a link outlining the details of his approach?


It was in DMs, so I don't feel comfortable presenting the idea publicly. He'll talk about it when he's ready.

You could try asking him. He likes talking about this, especially if you phrase your questions well.


Yeah, I'm more convinced by Bellard's BPG. The comparison speaks for itself:

https://bellard.org/bpg/lena.html


Crysis 3 used YUV 4:2:2 in its G Buffer. I think only the final tonemapped LDR image was converted to RGB. It was a peculiar optimization that I haven't seen anywhere else again.


Please stop using twitter John.

It's the 21st century and brilliant minds are posting thoughts in broken unreadable parts on websites that completely don't work unless you're a religious "the modern web requires javascript REEEEEEE" zealot.


given that the article is about inefficient use of content delivery technology, it is poetically ironic to try (and kinda fail) to post said article on a website that doesn't allow the article to be fully posted


(a) Unless I'm mistaken, there are three tweets. Three.

(b) Can you use curl? Maybe hit the API.

(c) The modern web DOES require JavaScript. Sure, I'm not clear on what that has to do with reading three tweets available on a public API, but it's pretty weird to object to a declarative statement of truth.


Here here! This format stands right in the way of simply reading.


meh. PNG's are the way to go. All the advantages, none of the draw backs. And it has Alpha.


The vast majority of time F-15 flies at subsonic speed. Converting it to propeller-driven design will save a lot of fuel and production and maintenance costs.

YUV format is lossy. The one that is twice as bandwidth-efficient as RGB records four times less color difference than full bandwidth YUV or RGB. Add to that that RGB often used with alpha channel and for half the bandwidth you will get twice the artifacts.

The subsampling scheme that is half of RGB is called 4:1:1: https://en.wikipedia.org/wiki/Chroma_subsampling#4:1:1

It is not broadcast quality. 4:2:0 is also not good for today's standards.


You've missed the point. He isn't advocating for everyone to suddenly use JPEG where they wouldn't before. He's advocating that, since there are so many JPEGs already in use, the architecture for displaying them should avoid unnecessarily converting them into RGB 32bit, and instead display them without conversion in a YUV 420 format, to align with the internal encoding of JPEGs which is YUV 420.

Essentially, rendering JPEGs directly as a GPU texture (since GPUs can already natively use 420 encodings for textures). His point is that this would improve performance, very slightly improve quality, and reduce power usage as there's no colour space conversion.


Mathematically Y'CbCr is a lossless conversion of RGB. Practically there are some losses, though, especially if you use video-range.

4:1:1 is half of RGB but I've never seen anyone ever use it. Pretty much everything in use today (at the consumer level) uses 4:2:0. It's good enough for today's standards (AV1 Main profile only supports 4:0:0 and 4:2:0).


>Mathematically Y'CbCr is a lossless conversion of RGB.

Assuming infinite precision, sure.

Interestingly however, there's a closely related color space called YCoCg, that actually does achieve lossless coversion to/from RGB (this lossless variant is often referred to as YCoCg-R, the R stands for "reversible"): https://en.wikipedia.org/wiki/YCoCg#The_lifting-based_YCoCg-...


This is correct, with one subtlety.

YUV to RGB conversion (and vice-versa) requires use of conversion coefficients (BT.470, BT.709, etc): https://en.wikipedia.org/wiki/YUV

Different apps/algorithms can choose their own coefficients, so you get a slightly different RGB colors. If you converted RGB back to YUV with a different set of coefficients, you would get a different result.


JPEG specifies the matrix, I think it's the same as BT.470. It's fairly easy to guess the intent even if the metadata is missing (which, well, it usually is) - just have to know if the source is a video or not and then if it's HD or not.


What’s 4:1:1? What people call 4:2:0 is actually 4 pixels of Y and 1 each of Cb and Cr.


Wikipedia has a decent article with visuals: https://en.wikipedia.org/wiki/Chroma_subsampling#Sampling_sy...

The difference between 4:1:1 and 4:2:0 is that in 4:1:1 the chroma covers a 1x4 line of pixels whereas in 4:2:0 the chroma covers a 2x2 square of pixels.


TIL. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: