The vast majority of images are jpegs, which are internally 420 YUV, but they get converted to 32 bit RGB for use in apps. Using native YUV formats would save half the memory and rendering bandwidth, speed loading, and provide a tiny quality improvement.
What does he mean by using native YUV formats? Something (I wave my hand) in the rendering pipeline from the JPEG in memory to pixels on the screen?
Your display uses 3 bytes per pixel. 8 bits for each of the R, G, and B channels. This is known as RGB888. (Ignoring the A or alpha transparency channel for now).
YUV420 uses chroma subsampling, which means the color information is stored at a lower resolution than the brightness information. Groups of 4 pixels will have the same color, but each pixel can have a different brightness. Our eyes are more sensitive to brightness changes than color changes, so this is usually unnoticeable.
This is very advantageous for compression because YUV420 requires 6 bytes per 4 pixels, or 1.5 bytes per pixel, because groups of pixels share a single color value. That's half as many bytes as RGB888.
When you decompress a JPEG, you first get a YUV420 output. Converting from YUV420 to RGB888 doesn't add any information, but it doubles the number of bits used to represent the image because it stores the color value for every individual pixel instead of groups of pixels. This is easier to manipulate in software, but it takes twice as much memory to store and twice as much bandwidth to move around relative to YUV420.
The idea is that if your application can work with YUV420 through the render pipeline and then let a GPU shader do the final conversion to RGB888 within the GPU, you cut your memory and bandwidth requirements in half at the expense of additional code complexity.
In Commandos 2 we stored the huge map images in YCbCr 420 in memory, and converted to RGB during rendering via lookup tables rather than shaders (this was back in 1999, all CPU!). We compressed the range of each component below 8 bits, running a histogram so our bits represented values inside the range actually present in the original image. We got it down to 6 bits per pixel or something close to that.
Converting via lookup tables, one for each component, made it very cheap to perform palette-style tricks like tonemapping and smooth color shifts from day to night.
[Edit: it may have actually been CIE Lab, not YUV/YCbCr, because the a&b tended to have narrower ranges. It's been too long!]
If they were hard to play, I can assure you they were MUCH harder to create. I look back and find it hard to believe how much blood and tears the artists and level designers had to sweat. All the kudos go to them, but I admit I felt very happy and proud when I finally found the right tech to make them justice at runtime.
That doesn't really make sense to me, even in pipelines that deal only with YUV (very common in professional video) you always upsample to 4:4:4 in a linear colorspace and you convert back to 4:2:0 or 4:2:2 at end.
How would you do even trivial things like color blending in 4:2:0?
The implication is that you would keep the data in its original form (or 420) only until you have to perform an operation to output a new image/rendering. You could then perform the operation (such as shading operations) in a larger color space on a graphics card, and finally afterward output again, perhaps as compressed YUV or perhaps as RGB over the wire in 8-, 10- or 12-bit (as you get to fancier HDR10+ and/or Dolby Vision). (Note in this post I'm using YUV instead of the more technically correct YCbCr...)
That said, this (storing JPEGs in YUV420) is just an optimization and the more images and displays go HDR, the less frequently we'll see YUV JPEGs, though we could see dithered versions, maybe, in 444. That's basically the same thing, but once you discard 420 and 422 you might as well use standard 8-bit RGB and skip the complexity of YCbCr altogether. If you're curious about HDR as I was, though 10-bit is "required", you can dither HDR to 8-bit and not notice the difference unless doing actual colour grading (where you need the extra detail, of course). For obvious reasons, I've never heard of anyone dithering HDR to YUV 420 though, and most computer screens look pretty terrible in when output to a TV as YUV422 or YUV420.
In what way is RGB easier to manipulate? I get that it's the format we have to send to the monitor. Also, current tools may be geared more towards it. But on the face of it, I don't see why it is simpler. Do you have any good examples?
I can think of several image processing tasks which are more straightforward in a luma/chroma format. Maybe it's because I'm more used to working with the data in that form?
Probably because it's the native input format for every display technology in existence? If you're going to twiddle an image, you might want to do it in the format that it's displayed in.
Mostly the issue is the layout, not the color space. YUV420 is awkward to handle because it's "420", not because it's YUV. There wouldn't be too many issues if everything ran on the 1-byte-per-pixel-component YUV444, besides converting to/from RGB for certain processing steps.
This was such a good explanation, thank you for taking the time.
I think a twitter account called @JohnCarmackELI5 that just explains all of Johns tweets like I have no idea what he's talking about would be invaluable. The man is obviously bursting with good/interesting stuff to say, but I grok it like 5% of the time.
I would imagine a shader that converts YUV to ARGB at the time of rendering as opposed to storing it all the way along the pipeline as 32 bit integers.
It's a bit tricky because rendering pipelines composite the final image through many layers of offscreen compositing before the pixel hits the screen.
The core issue is that the offscreen composited layers would still be 32bit textures which is a bigger issue. I would imagine a Skia-based draw list to encode this through the pipeline which could help preserve this perhaps.
Yup. You typically first decode the JPEG into a raw image format in GPU memory & then that gets transferred to scanout HW that actually sends it to the display (e.g. via HDMI). The argument feels a bit weak though. At that point why not just use JPEG through your entire pipeline & save even more bandwidth. Of course you have to decompress if you’re modifying the image in any way so it doesn’t help there.
Yes of course. GPUs already support compressed textures though that get decompressed on the fly. Of course JPEG has a lot of baggage that actually makes it a poor format for this but perhaps newer ones might be useful. What you lose in “optimality” (compression schemes designed for GPUs are better for games), you win in “generality” (ability to use this outside games to, for example, improve web browsing battery life).
There's a very hard distinction between GPU texture compression formats and general image compression formats - the former need to support random access lookup in constant time. Anything meeting that criterion is not going to be generally competitive; it's like the difference between UTF-32 strings and UTF-8 strings.