I have mixed feelings about this memo. It's right about practical aspects of resampling filters, but tries too hard to justify that with sampling theory. For example, pixel-aligned sharp edges exist and are meaningful in images, unlike perfectly square waves in sampling theory.
There's no problem with viewing an image as a wave, however if you do so you also need to accept that it contains frequency components well beyond the Nyquist frequency so the sampling theorem will be of no help in reconstructing the image.
Still lots of operations on image can be stated in terms of convolution (or a regularized inverse of it) so it's not like Fourier analysis is entirely useless.
Pixel aligned sharp edges do not actually exist and are not meaningful in images, because a perfectly sharp lens does not exist (and cannot exist) and as a result you can never form a sharp edge in an image. You also have de-focus that prevents you from doing so, and a lens that has a wider depth of field has an immediately noticeable limit in sharpness.
Even if you were somehow able to create a perfect lens, you would not be able to create a perfectly sharp edge with real world objects.
See, here you're committing the same error that the paper does: pretending pixels are all about photography and optics, and ignoring that some computer-generated graphics actually are supposed to represent perfect squares.
Sometimes, pixels really are little squares. Not always, but not never, either.
I disagree. Would you say a specific address in an array has edges? No, because it's an address, a fixed point. Even if that array accidentally described a square, the individual array addresses would have nothing to do with that.
It's not a question of the representation, it's a question of quanta. Pixels are data, the little squares are the artifacts your LCD eventually produces with the help of that data.
When your GPU is rasterizing the edges of polygons, it computes (sometimes just approximates) how much of a little square is covered by that polygon and uses that as the weight when averaging what color to assign to that pixel. The resulting rendered image is most correctly interpreted as an array of little squares, not point samples and definitely not truncated gaussians.
Actually no, that's not the case, rasterization is on-off at the hardware level. You need anti-aliasing for the behaviour you are describing, which very rarely works the way you describe - the best we have right now as far as quality uses multisampling.
And that is not done by the GPU. The top comment started with "your GPU does...."
In any case, that is an extreme edge case of software renderers that doesn't even come close to a significant part of 2D graphics in real life. Indeed, most 2D graphics is really flat 3D graphics done using GPU routines and does not the work that way. I know that some extreme edge cases do use coverage based rasterization, but :
>You need anti-aliasing for the behaviour you are describing, which very rarely works the way you describe
This is a case of anti-aliasing (read the title of the article) and is extremely rarely used. It's essentially irrelevant when discussing how graphics work in real life.
I really cannot overstate just how rarely software rasterizers are used for interactive graphics in 2020, coverage based rasterizers are an even smaller subset of that. It really makes a ton more sense to use a GPU rasterizer and use MSAA or oversample the whole image.
2D graphics on the GPU is an open research problem. My understanding is that piet-gpu and Pathfinder, both state of the art research projects, use coverage-based solutions based on the GPU. MSAAx16, which is incredibly expensive on mobile devices, only provides 16 different coverage values, and from my limited tests was poorer quality than a coverage-based solution.
2D graphics on the GPU is not an open research problem in practice. In real life you either use Direct2D or OpenGL/Vulkan/Direct3D and just... ignore a dimension.
Yes, MSAA 16x is incredibly expensive on mobile devices, and it provides a worse result than a coverage based approach. But MSAA 16x is done by an asic, and is simpler than coverage based AA. It is not even close in performance. A GPU ROP trounces any programmable compute unit as far as performance, it's not even closed. It is done by pecialized, in silico hardware. And in practice MSAA 8x is more than good enough, especially on mobile devices. You certainly will not notice a difference on a phone with a density of 563 dpi between MSAA 4x and 8x, let alone 16x and coverage based.
At those scales, the resolution of the phone is literally greater than the optical resolution of the optical system that is your eyes. There is no point in anything beyond MSAA 4x in reality, and a lot of people with displays in the 200 dpi range just use 2X MSAA while they could use 8X MSAA because they really can't tell the difference.
The final nail in the coffin is that these compute-based rasterization engines so far more or less match the performance of CPU rasterization. This is simply unacceptable when GPU direct rasterization can give results nearly indistinguishable at multiple times the performance and much less power usage. This is literally taking something done by a highly optimized, 12-7nm ASIC, and trying to do it through compute for a tiny improvement. It's absurd.
Some rasterization algorithms are done this way, but they arguably are getting suboptimal results, and would do better to apply some other filter, instead of a box. (As pixels keep getting smaller and smaller it matters less though.)
> resulting rendered image is most correctly interpreted as an array of little squares
Still nope. What matters in the end is the viewer’s eyes/brain reconstruction of the image, and given the frequency response of human eyes to typical screens at typical viewing distances, there is little if any practical difference between convolving some eye-like reconstruction filter with pixels thought of as uniform-brightness squares vs. point samples.
If you want to improve your results you’ll get much more bang for your buck from considering RGB subpixels to be point samples offset the appropriate amounts for the given physical display than you’ll get from thinking of any of them as being an area light source instead of a point.
The data often "knows" where it will be displayed, and is designed with that knowledge. To go to the absurd extreme here, the data is just 1s and 0s, not pixels or samples or anything more.
Of course that's nonsense, because the data has context and the arrangement of those samples or pixels has a purpose.
Sometimes that purpose is to serve as a sampling of a real-world continuous image, other times it is to describe the arrangement and color of tiny little squares on an LCD screen.
Depending on what context you're working in pixels can be squares, or may not be.
A UI element will almost never be perfectly square, exactly aligned on a pixel. And even if it was, LCD pixels aren't even squares. They are three rectangles with different colors and gaps between them. This fact means that if you want to have proper representation of even a pixel-perfect square, because of the gap between pixels and because of sub-pixels, a perfect representation would not be such.
Most pixel art today is designed around square pixels. The pixels are typically blown up to some multiple for viewing - and in many cases, not even an integer multiple! This has resulted in the design of filter kernels that make some trade-off between the true sharpness of square pixels and the aliasing of nearest neighbor interpolation.
Therefore, even before images hit the display there is a rationale to avoid using photographic resampling techniques, just because the method of their authoring defined the meaning otherwise.
See also: The gradual evolution of font rendering techniques. Earlier versions of Windows aimed for pixel-grid snapping to produce clean, sharp edges. Newer ones introduced anti-aliasing techniques, but again made trade-offs towards sharp edges, including exploitation of the LCD display format, while the contemporary Apple rendering favored the photographic approach. With the introduction of high DPI the differences have become less pronounced, but there's still disagreement about how best to render vector fonts into pixels.
Then how come a column of white pixels next to a column of black pixels looks perfectly sharp, as sharp as a black sheet of paper on a white background?
Whatever pixels "are", they are certainly tools for inducing a certain perception in the eye of the viewer, so we should go by how that works. We're used to this from color itself - no one denies that 0xffff00 is "yellow", despite the physical emission containing no yellow light, because red + green in certain proportions induces the same response in our eyes as light whose wavelength is actually yellow. So why can't we apply the term "squares" to things that our eyes see as squares even if they are not physically squares?
Open this in IrfanView, set the magnification to 100%. It looks about 75%-25% dark grey to light grey unless you really focus on it. This is exactly what you would expect if you conceptualize pixels as points.
So no, a column of white pixels next to a column of black pixels does not look perfectly sharp. If it did, the image above would look perfectly black and white, with no grey. And by the time I approach the pixels enough to see perfect black vs perfect white, I also notice the black gaps between pixels, and very soon the sub-pixels themselves!
This is also why you don't see individual sub-pixels, by the way.
Your eyes largely cannot distinguish the individual squares that pixels are if you are using a non-ancient screen. Your eye does not form an image of the square pixel. It largely loses them as they blend into each other to form an image, in which pixels are much, much closer to points than they are to squares.
Not 'perfectly' sharp, but if you have a bunch of objects where the edges go from 0 to 200, and you have a bunch of objects where the edges go from 0 to 100 to 200, there is a significant difference in sharpness on most screens.
Sure. And that is also perfectly consistent with considering pixels are dimensionless points forming an image, which they really are in practice much more than they are simply squares.
Indeed, an edge that goes from 0 to 100 can be considered as part of a wave of twice the frequency but the same amplitude as compared to an edge that goes from 0 to 200. Which is, by the way, why increasing the contrast in an image, especially micro-contrast, in practice increases resolution.
This is supposed to be an edge with nearly perfect sharpness.
If you take a single point sample, then slowly moving objects will appear to jump an entire pixel at a time, looking awful.
If you antialias, then movement will look smooth, but you'll also notice than when you align to the pixel squares the edge will preserve its sharpness better.
You have to be really careful when you're applying wave equations to resolution, especially when declaring that a certain number of samples fully captures an image.
If you want to display a perfect image with point samples, you may need to go as far as 10x the 'retina' density.
https://en.wikipedia.org/wiki/Hyperacuity_(scientific_term)
Take a sheet of white paper 10 inches wide and draw 480 evenly spaced vertical lines on that paper. Do you see the lines? Or do you see a gray sheet of paper?
>looks perfectly sharp, as sharp as a black sheet of paper
They don't. Pixels are so small that they they start looking like (or are well into looking like) a sample that is used to from an image. If pixels "looked perfectly sharp, as sharp as a black sheet of paper", this wouldn't be the case.
As far as your eyes, the sharp lines might as well be blurry waves of an appropriate contrast. So in an image sense, the lines don't really exist anymore, all that exists is a blurry brightness function and not infinitely sharp lines.
I don't know about yours, but my mouse cursor is not a perfectly straight 2+ pixel wide solid-colored rectangle, so that doesn't really matter. As for windows edges, I'll give you that one.
However, the edge of my window happens not to be green, so it doesn't actually align with the elements of my LCD :) I just so happen not to notice it, because my eyes don't have individual pixel resolution, almost as if there was a low-pass filter of the order > 2 lambda... Food for thought!
That's true, but we're really getting there fast. My very sharp Minolta F1.4 full-frame lens is diffraction limited already at f/8 to f/11 at a definition of 6000 x 4000. It's also much, much sharper than the vast majority of pictures people take.
While yes, 6000x4000 is a lot, monitors are already coming out with higher resolutions, so it's relevant right now. The fraction of images taken at an actual resolution so high that it has to be down scaled on a 6K or 8K monitor in practice is exceedingly thin. Even with a Sony A7RIV, an insane camera, and mind boggingly sharp lenses, most of your pictures after the Bayer filter and taking into account sampling (which is very real in photography due to Moire), most of your pictures either due to depth of field, optical abberations or motion blur, will not be at the level where you can create a truly sharp line of frequency higher than that between two pixels.
So while it is often true, this is increasingly not the case.
Pixel-aligned sharp edges were super important on low-DPI displays, but they are increasingly less important. A one-pixel line on a phone screen is tiny, so most designs will use lines of more than one pixel. A pixel-perfect 3-pixel line doesn't look very different from an anti-aliased 3-pixel line with arbitrary floating-point coordinates on a phone.
Twitter? Because Twitter has been doing that a lot for me recently. It's like all their devs are on mobile or Retina screens and overpowered CPUs and nobody noticed the problem (and resulting processing overhead).
I have mixed feelings about this memo. It's right about practical aspects of resampling filters, but tries too hard to justify that with sampling theory. For example, pixel-aligned sharp edges exist and are meaningful in images, unlike perfectly square waves in sampling theory.