I love to see more work in this space. It's clear that GPU compute is the future of 2D rendering, and we need to explore a bunch of different approaches to find the best one. I especially appreciate the focus on rendering quality; the author is absolutely correct that the current state of Vello has conflation artifacts and does not do antialiasing in the correct (linear) colorspace.
We do have a plan for conflation free compositing[1] which should closely approximate the quality of the samples here. That in turn depends on sparse strips[2], though a degraded performance experiment could be done to validate the quality outcomes. Sparse strips in turn depend on high performance segmented sort[3].
The analytic approach to path overlaps is intriguing, but I think it will be very challenging to implement efficiently on GPU. I'm looking forward to seeing what results.
The analytic approach to occlusion definitely does seem like a "humbling parallelism" type of problem on the GPU. My curiosity is leading me to explore it, and it may be reasonable if I find alternatives to large GPU sorts (although I understand you've done some work on that recently). I think the Vello approach is very likely the superior option for the best general quality/performance tradeoff.
Ah, you watched my "good parallel computer" talk[﹡] :)
If I were going to take it on, I'd start with BVH construction - the H-PLOC paper at the latest HPG [1] looks promising - then traverse down the hierarchy until you get very small number of path segments so you can pairwise compare them. Obviously any time there is an intersection you need at least the two segments.
This seems hard to me, humbling even. I mean, overlap removal is hard enough on the CPU, especially because it's so sensitive to numerical robustness, and doubly so for curves. But I think you'll learn something for trying!
I'm curious what your thoughts are on my approach for robustness.
I've written similar operations[1] that include elliptical arcs AND Bézier curves, and robustness has been an issue. An assortment of polygon clipping approaches use ONLY line segments and fixed precision numbers to work around this.
If I discretize the line segments to 20bits (fixed), potentially in tiles (to reduce inaccuracies), then I can represent exact rational intersections (and parametric t values) with 64-bit numerators and 64-bit denominators[2].
This significantly concerns me about computation cost (and memory if I need to store them), but using this scheme to ensure absolute correctness (and ordering) of intersections does seem to work in CPU proof-of-concepts. Perhaps it is possible to only use this expensive method to disambiguate if it cannot be done from floating-point numbers.
My initial GPU concept imagined a sorting of intersected segments, however a radix sort over a 128-bit object seems like a non-starter, and if I try that, a merge sort may still be viable?
If it's textured, an anisotropic filter is a good approximation of the constant colour over the pixel area. The main source of aliasing is the polygon edge, not the texture
Is that necessarily the case for vector graphics? I actually don’t know how they define their colors. It seems intuitive enough to define a gradient using a function rather than discrete series, but I have no idea if anyone actually does that.
They absolutely do define gradients via functions - but the point it, the screen only has a finite number of pixels. As I understood it, pixels is what this technique uses as it's dicing primitive. Pixel sized rectangles cut out of the polygon.
Pixar's Renderman used to use a (slightly) related technique. This was along time ago, and I have no idea if they still do things this way - but they would subdivide patches until each primitive was smaller than a pixel.
Ahh yes, for exact filtering it does need to be constant colour. I'm looking into seeing whether it can be done for gradients. However in practice, it works quite well visually to compute the "average color of the polygon" for each piecewise section, and blend those together.
Arvo’s work is also using Green’s theorem, in much the same way this article is. Integrating the phong-exponent light reflections is a little bit insane though (and btw I’ve seen Arvo’s code for it.)
The problem you run into with non-constant polygon colors is that you’d have to integrate the product of two different functions here - the polygon color and the filter function. For anything real-world, this is almost certainly going to result in an expression that is not analytically integrable.
Yup, I've tried many years ago to follow this work (special cases for even and odd exponents!), with a mix of analytic and numerical integration. IMO linear/tent filter is sufficient. Also more recently there's the Linearly Transformed Cosine stuff, which is most of what people usually want in realtime graphics.
Ideally you also want motion blur and probably some other effects, so IMO it just makes sense to use a 2D BVH and high efficiency Monte Carlo importance sampling methods.
I'm surprised that the article doesn't mention Fourier transforms and neither do any of the comments. All the talk about aliasing and different filters makes a whole lot more sense if you take a look at the frequency domain. (Unfortunately I don't have time to elaborate here. But Fourier transforms are so useful in many ways that I encourage people to learn more.)
If you want to go hardcore into the frequency domain, may I suggest "A Fresh Look at Generalized Sampling" by Nehab and Hoppe[1]. That said, while a frequency-centric approach works super well for audio, for images you need to keep good track of what happens in the spatial domain, and in particular the support of sampling filters needs to be small.
The problem with these sorts of analytical approaches is how to handle backgrounds, depth and intersections. There are good reasons why GPUs rely on variations of multisampling. Even CPU-based 3D render engines use similar methods rather than analytic filters, as far as I know.
A more interesting approach to antialiasing, in my opinion, is the use of neural nets to generate aesthetically pleasing outputs from limited sample data, as seen for example in NVidia's DLAA [0]. These methods go beyond trying to optimize over-simplistic signal processing reconstruction metrics.
If only it was so simple. You typically don't have the memory capacity and computational budget to sort and render the whole scene back to front. You can use bucketing and other tricks to try and do a better job, but at the end of they day it is just impractical. This method has been studied for decades and it is still not in common use.
For comparison, Tom Duff has just made available a copy of his 1989 paper "Polygon scan conversion by exact convolution" which uses more or less the same mathematical trick. It involves a scanline algorithm and avoids polygon clipping.
> This is equivalent to applying a box filter to the polygon, which is the simplest form of filtering.
Am I the only one who has trouble understanding what is meant by this? What is the exact operation that's referred to here?
I know box filters in the context of 2D image filtering and they're straightforward but the concept of applying them to shapes just doesn't make any sense to me.
The operation (filtering an ideal, mathematically perfect image) can be described in two equivalent ways:
- You take a square a single pixel spacing wide by its center and attach it to a sampling point (“center of a pixel”). The value of that pixel is then your mathematically perfect image (of a polygon) integrated over that square (and normalized). This is perhaps the more intuitive definition.
- You take a box kernel (the indicator function of that square, centered, normalized), take the convolution[1] of it with the original perfect image, then sample the result at the final points (“pixel centers”). This is the standard definition, which yields exactly the same result as long as your kernel is symmetric (which the box kernel is).
The connection with the pixel-image filtering case is that you take the perfect image to be composed of delta functions at the original pixel centers and multiplied by the original pixel values. That is, in the first definition above, “integrate” means to sum the original pixel values multiplied by the filter’s value at the original pixel centers (for a box filter, zero if outside the box—i.e. throw away the addend—and a normalization constant if inside it). Alternatively, in the second definition above, “take the convolution” means to attach a copy of the filter (still sized according to the new pixel spacing) multiplied by the original pixel value to the original pixel center and sum up any overlaps. Try proving both of these give the answer you’re already accustomed to.
This is the most honest signal-processing answer, and it might be a bit challenging to work through but my hope is that it’ll be ultimately doable. I’m sure there’ll be neighboring answers in more elementary terms, but this is ultimately a (two-dimensional) signal processing task and there’s value in knowing exactly what those signal processing people are talking about.
[1] (f∗g)(x) = (g∗f)(x) = ∫f(y)g(x-y)dy is the definition you’re most likely to encounter. Equivalently, (f∗g)(x) is f(y)g(z) integrated over the line (plane, etc.) x=y+z, which sounds a bit more vague but exposes the underlying symmetry more directly. Convolving an image with a box filter gives you, at each point, the average of the original over the box centered around that point.
There’s a picture of the exact operation in the article. Under “Filters”, the first row of 3 pictures has the caption “Box Filter”. The one on the right (with internal caption “Contribution (product of both)”) demonstrates the analytic box filter. The analytic box filter is computed by taking the intersection of the pixel boundary with all visible polygons that touch the pixel, and then summing the resulting colors weighted by their area. Note the polygon fragments also have to be non-overlapping, so if there are overlapping polygons, the hidden parts need to be first trimmed away using boolean clipping operations. This can all be fairly expensive to compute, depending on how many overlapping polygons touch the pixel.
OK, so reading a bit further this boils down to clipping the polygon to the pixel and then using the shoelace formula for finding the area? Why call it "box filter" then?
It’s very useful to point out that it’s a Box Filter because the article moves on to using other filters, and larger clipping regions than a single pixel. This is framing the operation in known signal processing terminology, because that’s what you need to do in order to fully understand very high quality rendering.
Dig a little further into the “bilinear filter” and “bicubic filter” that follow the box filter discussion. They are more interesting than the box filter because the contribution of a clipped polygon is not constant across the polygon fragment, unlike the box filter which is constant across each fragment. Integrating non-constant contribution is where Green’s Theorem comes in.
It’s also conceptually useful to understand the equivalence between box filtering with analytic computation and box filtering with multi-sample point sampling. It is the same mathematical convolution in both cases, but it expressed very very differently depending on how you sample & integrate.
Oh interesting, I hadn’t thought about it, but why does it seem like poor naming? I believe “filter” is totally standard in signal processing and has been for a long time, and that term does make sense to me in this case because what we’re trying to do is low-pass filter the signal, really. The filtering is achieve through convolution, and to your point I think there might be cases in which referring to the filter function as a basis function occurs.
I would think that conceptually that a basis function is different form a filter function because a basis function is usually about transforming a point in one space to some different space, and basis functions come in a set that’s the size of the dimensionality of the target space. Filters, even if you can think of the function as a sort of basis, aren’t meant for changing spaces or encoding & decoding against a different basis than the signal. Filters transform the signal but keep it in the same space it started from, and the filter is singular and might lose data.
It is more similar to the convolution of the shape with the filter (you can take the product of the filter, at various offsets, with the polygon)
Essentially if you have a polygon function p(x,y) => { 1 if inside the polygon, otherwise 0 }, and a filter function f(x,y) centered at the origin, then you can evaluate the filter at any point x_0,y_0 with the double-integral / total sum of f(x-x_0,y-y_0)*p(x,y).
This kind of makes sense from a mathematical point of view, but how would this look implementation-wise, in a scenario where you need to render a polygon scene? The article states that box filters are "the simplest form of filtering", but it sounds quite non-trivial for that use case.
If it essentially calculates the area of the polygon inside the pixel box and then assigns a colour to the pixel based on the area portion, how would any spatial aliasing artifacts appear? Shouldn't it be equivalent to super-sampling with infinite sample points?
It literally means that you take a box-shaped piece of the polygon, ie. the intersection of the polygon and a box (a square, in this case the size of one pixel). And do this for each pixel as they’re processed by the rasterizer. If you think of a polygon as a function from R^2 to {0, 1}, where every point inside the polygon maps to 1, then it’s just a signal that you can apply filters to.
But as I understand it, the article is about rasterization, so if we filter after rasterization, the sampling has already happened, no? In other words: Isn't this about using the intersection of polygon x square instead of single sample per pixel rasterization?
This is about taking an analytic sample of the scene with an expression that includes and accounts for the choice of filter, instead of integrating some number of point samples of the scene within a pixel.
In this case, the filtering and the sampling of the scene are both wrapped into the operation of intersection of the square with polygons. The filtering and the sampling are happening during rasterization, not before or after.
Keep in mind a pixel is an image sample, which is different from taking one or many point-samples of the scene in order to compute the pixel color.
The problem is determining the coverage, the contribution of the polygon to a pixel's final color, weighted by a filter. This is relevant at polygon edges, where a pixel straddles one or more edges, and some sort of anti-aliasing is required to prevent jaggies[1] and similar aliasing artifacts, such as moiré, which would result from naive discretization (where each pixel is either 100% or 0% covered by a polygon, typically based on whether the polygon covers the pixel center).
I am quite convinced that if the goal is the best possible output quality, then the best approach is to analytically compute the non-overlapping areas of each polygon within each pixel. Resolving all contributions (areas) together in the same single pass for each pixel.
Why are you convinced of this, and can I help unconvince you? ;) What you describe is what’s called “Box Filtering” in the article. Box filtering is well studied, and it is known to not be the best possible output quality. The reason this is not the best approach is because a pixel is not a little square, a pixel is a sample of a signal, and it has to be approached with signal processing and human perception in mind. (See the famous paper linked in the article: A Pixel is Not a Little Square, A Pixel is Not a Little Square, A Pixel is Not a Little Square http://alvyray.com/Memos/CG/Microsoft/6_pixel.pdf)
It can be surprising at first, but when you analytically compute the area of non-overlapping parts of a pixel (i.e., use Box Filtering) you can introduce high frequencies that cause visible aliasing artifacts that will never go away. This is also true if you are using sub-sampling of a pixel, taking point samples and averaging them, no matter how many samples you take.
You can see the aliasing I’m talking about in the example at the top of the article, the 3rd one is the Box Filter - equivalent to computing the area of the polygons within each pixel. Look closely near the center of the circle where all the lines converge, and you can see little artifacts above and below, and to the left and right of the center, artifacts that are not there in the “Bilinear Filter” example on the right.
I think the story is a lot more complicated. Talking about "the best possible output quality" is a big claim, and I have no reason to believe it can be achieved by mathematically simple techniques (ie linear convolution with a kernel). Quality is ultimately a function of human perception, which is complex and poorly understood, and optimizing for that is similarly not going to be easy.
The Mitchell-Netravali paper[1] correctly describes sampling as a tradeoff space. If you optimize for frequency response (brick wall rejection of aliasing) the impulse response is sinc and you get a lot of ringing. If you optimize for total rejection of aliasing while maintaining positive support, you get something that looks like a Gaussian impulse response, which is very smooth but blurry. And if you optimize for small spatial support and lack of ringing, you get a box filter, which lets some aliasing through.
Which is best, I think, depends on what you're filtering. For natural scenes, you can make an argument that the oblique projection approach of Rocha et al[2] is the optimal point in the tradeoff space. I tried it on text, though, and there were noticeable ringing artifacts; box filtering is definitely better quality to my eyes.
I like to think about antialiasing specific test images. The Siemens star is very sensitive in showing aliasing, but it also makes sense to look at a half-plane and a thin line, as they're more accurate models of real 2D scenes that people care about. It's hard to imagine doing better than a box filter for a half-plane; either you get ringing (which has the additional negative impact of clipping when the half-planes are at the gamut boundary of the display; not something you have to worry about with natural images) or blurriness. In particular, a tent filter is going to be softer but your eye won't pick up the reduction in aliasing, though it is certainly present in the frequency domain.
A thin line is a different story. With a box filter, you get basically a non antialiased line of single pixel thickness, just less alpha, and it's clearly possible to do better; a tent filter is going to look better.
But a thin line is just a linear combination of two half-planes. So if you accept that a box filter is better visual quality than a tent filter for a half-plane, and the other way around for a thin line, then the conclusion is that linear filtering is not the correct path to truly highest quality.
With the exception of thin lines, for most 2D scenes a box filter with antialiasing done in the correct color space is very close to the best quality - maybe the midwit meme applies, and it does make sense to model a pixel as a little square in that case. But I am interested in the question of how to truly achieve the best quality, and I don't think we really know the answer yet.
In my opinion, if you break down all the polygons in your scene into non-overlapping polygons, then clip them into pixels, calculate the color of each piece of polygon (applying all paints, blend modes, etc) and sum it up, ...in the end that's the best visual quality you can get. And that's the idea i'm working on, but it involves the decomposition/clip step on the CPU, while sum of paint/blend is done by the GPU.
That isn’t true. Again, please look more closely at the first example in the article, and take the time to understand it. It demonstrates there’s a better method than what you’re suggesting, proving that clipping to pixels and summing the area is not the best visual quality you can get.
As pointed out by Raphlinus, the moire pattern in the Siemens star isn't such a significant quality indicator for the type of content usually encountered in 2D vector graphics. With the analytical coverage calculation you can have perfect font/text rendering, perfect thin lines/shapes and, by solving all the areas at once, no conflating artifacts.
Raph made an argument that Box is good enough for lots of things, which is subjective and depends entirely on what things you’re doing, and how much you actually care about quality.
You are claiming it’s the best possible. Box filter is simply not the best possible, and this fact is well understood and documented.
You can relax your claim to say it’s good enough for what you need, and I won’t disagree with you anymore. Personally, I’m sensitive to visible pixelation, and the Box Filter will always result in some visible pixelation with all 2D vector graphics, so if you really care about high quality rendering, I’m very skeptical that you really want Box filtering as the ideal target. Box filter is a compromise, it’s easier & faster to compute. But it’s not the highest quality. It would be good to understand why that’s the case.
* Edit to further clarify and respond to this:
> With the analytical coverage calculation you can have perfect font/text rendering, perfect thin lines/shapes and, by solving all the areas at once, no conflating artifacts.
You cannot get perfect font or text rendering with a Box filter, and you will get some conflating artifacts. They might be very slight, and not bothersome to most people, but they do exist with a Box filter, always. This is a mathematical property of Box filtering, not a subjective claim.
Why do conflation artifacts always exist with a box filter? AFAIK conflation artifacts are a product of the compositing process, not the filtering process.
If you have two non-overlapping shapes of the same color covering the plane and use a box filter on the first shape to sample a pixel on the boundary, and then use the same box filter on the second shape, and then composit them with alpha blending, you get a conflation artifact along the boundary where the background bleeds through.
But if you use the fact that the shapes are non-overlapping and sum their contributions instead, the artifact disappears, while still using the same box filter.
It’s because sampling artifacts never disappear with Box. The reason is the high frequency aliasing is introduced by the filtering. It’s because the Box itself has infinite frequency response that you cannot eliminate the artifacts, it’s not possible. This is why all other, better filters fade their weight smoothly to zero at the support boundary.
You can see this with a single sharp edge, it doesn’t need to involve multiple polygons, nor even vector rendering, it happens when downsampling images too.
These are sampling artifacts, but I believe yorbwa is correct in distinguishing these from conflation artifacts, as defined in Kilgard & Bolz. I think of the latter as compositing not commuting exactly with antialiasing (sampling). You only get conflation artifacts when compositing multiple shapes (or rendering a single shape using analytical area when the winding number is not everywhere 0 or 1), while you definitely see aliasing when rendering a single shape, say a Siemens star.
Okay, that’s fair. I’m misusing the term ‘conflation’ in that sense. I was trying to make the point that compositing two wrong answers yields a wrong answer, but I stand corrected that it’s not the compositing that’s introducing error, it is the sampling + box-filtering.
I don't see how you can support the claim of perfect thin line rendering, it's visibly just not very good. So box filtering logically can't possibly be the best possible quality.
Can we make a magical adaptive filter which resembles box filter for half-planes, a tent filter for thin lines, Mitchell-Netravali or oblique projection for natural images, and Gaussian when filtering images for which high frequency detail is not important? Perhaps, but that feels like advanced research, and also computationally expensive. I don't think you can claim "perfect" without backing it up with human factors data really demonstrating that the filtered images are optimum with respect to perceived quality.
I wonder if the ideas behind GIMP's new nonlinear resampling filters (NoHalo and LoHalo, and eventually more https://graphicdesign.stackexchange.com/q/138059) may translate to vector rasterization in some form (though here we're translating continuous to discrete, not discrete to continuous to a differently spaced discrete).
Backing up to your earlier comment. Pixels on some displays are in fact little squares of uniform color. The question then is how to color a pixel given geometry with detail within that square.
All of this "filtering" is variations on adding blur. In fact the article extends the technique to deliberately blur images on a larger scale. When we integrate a function (which could be a color gradient over a fully filled polygon) and then paint the little square with a solid "average" color that's also a form of blurring (more like distorting in this case) the detail.
It is notable that the examples given are moving, which means moire patterns and other artifacts will have frame-to-frame effects that may be annoying visually. Simply blurring the image takes care of that at the expense of eliminating what looks like detail but may not actually be meaningful. Some of the less blurry images seem to have radial lines that bend and go back out in another location for example, so I'd call that false detail. It may actually be better to blur such detail instead of leaving it look sharper with false contours.
Yes it’s a good point that LCD pixels are more square than the CRTs that were ubiquitous when Alvy Ray wrote his paper. I think I even made that point before on HN somewhere. I did mention in response to Raph that yes the ideal target depends on what the display is, and the filter choice does depend on whether it’s LCD, CRT, film, print, or something else. That said, LCD pixels are not perfect little squares, and they’re almost never uniform color. The ideal filter for LCDs might be kinda complicated, and you’d probably need three RGB-separated filters.
Conceptually, what we’re doing is low-pass filtering, rather than blurring, so I wouldn’t necessarily call filtering just “adding blur”, but in some sense those two ideas are very close to each other, so I wouldn’t call it wrong either. :P The render filtering is a convolution integral, and is slightly different than adding blur to an image without taking the pixel shape into account. Here the filter’s quality depends on taking the pixel shape into account.
You’re right about making note of the animated examples - this is because it’s easier to demonstrate aliasing when animated. The ‘false detail’ is also aliasing, and does arise because the filtering didn’t adequately filter out high frequencies, so they’ve been sampled incorrectly and lead to incorrect image reconstruction. I totally agree that if you get such aliasing false detail, it’s preferable to err (slightly) on the side of blurry, rather than sharp and wrong.
Would DLP projectors which distribute color over time (a color wheel) or multiple light sources combined with dichroic filters, produce uniform squares of color?
In theory if the DMD mirrors were perfect little squares, and if the lens has perfect focus, and if the mirrors switch infinitely fast and are perfectly aligned with the color wheel in time, then maybe it’d be fair to call them uniform squares of color. In reality, the mirrors look square, but aren’t perfect squares - there’s variance in the flatness, aim, edges & beveling, and also both the lens and mirror switching blurs the pixels. The mirror switching over time is not infinitely fast, so the colors change during their cycle (usually multiple times per color of the wheel!) Not to mention some newer DLPs are using LEDs that are less square than DMD mirrors to begin with.
All this comes down to the projected pixels not being nearly as square as one might think (maybe that’s on purpose), though do note that squares are not the ideal shape of a pixel in the first place, for the same reason box filtering isn’t the best filter. If your pixel has sharp edges, that causes artifacts.
Oh I agree with all of that. And nice to see you on HN Raph - was nice to meet you at HPG the other day.
It’s subjective, so box filter being ‘close’ is a somewhat accurate statement. I’m coming from the film world, and so I have a pretty hard time agreeing that it’s “very” close. Box filter breaks often and easily, especially under animation, but it’s certainly better than nearest neighbor sampling, if that’s our baseline. Box filter is pretty bad for nearly any scenario where there are frequencies higher than the pixel spacing, which includes textures, patterns, thin lines, and all kinds of things, and the real world is full of these box-filter-confounding features.
One interesting question to ask is whether you the viewer can reliably identify the size of a pixel anywhere in the image. If you can see any stepping of any kind, the pixel size is visible, and that means the filter is inadequate and cannot achieve “best possible output quality”. Most people are not sensitive to this at all, but I’ve sat through many filter evaluation sessions with film directors and lighting/vfx supervisors who are insanely sensitive to the differences between well tuned and closely matching Mitchell and Gaussian filters, for example. Personally, for various reasons based on past experience, I think it’s better to err slightly on the side of too blurry than too sharp. I’d rather use a Gaussian than bicubic, but the film people don’t necessarily agree and they think Gaussian is too blurry once you eliminate aliasing. Once you find the sharpest Gaussian you can that doesn’t alias, you will not be able to identify the size of a pixel - image features transition from sharp to blurry as you consider smaller scales, but pixel boundaries are not visible. I’ve never personally seen another filter that does this always, even under contrived scenarios.
That said, I still think it’s tautologically true that box filter is simply not the “best” quality, even if we’re talking about very minor differences. Bilinear and Bicubic are always as good or better, even when the lay person can’t see the differences (or when they don’t know what to look for).
My opinion is that there is no such thing as “best” output quality. We are in a tradeoff space, and the optimal result depends on goals that need to be stated explicitly and elaborated carefully. It depends heavily on the specific display, who/what is looking at the display, what the viewer cares about, what the surrounding environment is like, etc., etc..
* edit just to add that even though I don’t think “best” visual quality exists, I do think box filter can never get there, the contention for top spot is between the higher order filters, and box filter isn’t even in the running. I had meant to mention that even a single 2d plane that black on one side and white on the other, when rendered with box filter, yields an edge in which you can identify visible stepping. If you handle gamma & color properly, you can minimize it, but you can still see the pixels, even in this simplest of all cases. For me, that’s one reason box filter is disqualified from any discussion of high quality rendering.
If there's one thing I've learned from image processing it's that the idea of a pixel as a perfect square is somewhat overrated.
Anti-aliasing is exactly as it sounds, a low-pass filter to prevent artefacts. Convolution with a square pulse is serviceable, but is not actually that good a low-pass filter, you get all kinds of moire effects. This is why a Bicubic kernel that kind of mimics a perfect low-pass filter (which would be a sinc kernel), can perform better.
It is tempting to use a square kernel though, because it's pretty much the sharpest possible method of acceptable quality.
I've been looking into how viable this is as a performant strategy. If you have non-overlapping areas, then contributions to a single pixel can be made independently (since it is just the sum of contributions). The usual approach (computing coverage and blending into the color) is more constrained, where the operations need to be done in back-to-front order.
I've been researching this field for 20 years (I'm one of the developers of AmanithVG). Unfortunately, no matter how fast they are made, all the algorithms to analytically decompose areas involve a step to find intersections and therefore sweepline approaches that are difficult to parallelize and therefore must be done in CPU. However, we are working on it for the next AmanithVG rasterizer, so I'm keeping my eyes open for all possible alternatives.
I ran across https://dl.acm.org/doi/pdf/10.1145/72935.72950 a few weeks ago, it seems like a potential non-sweepline highly-parallel method. I've had some promising results for first doing a higher-dimensional Hilbert-sort (giving spatial locality), and then being able to prune a very large percentage of the quadratic search space. It might still be too slow on the GPU. I'm curious if you have any write-ups on things that have been explored, or if I'd be able to pick your brain some time!
No, Vello does not analytically find intersections. Compositing is (currently) done by alpha blending, which is consistent with the W3C spec but has its own tradeoffs.
> compute the non-overlapping areas of each polygon within each pixel
In the given example (periodic checkerboard), that would be impossible because the pixels that touch the horizon intersect an infinite amount of polygons.
Not that TFA solves that problem either. As far as I know the exact rendering of a periodic pattern in perspective is an open problem.
The web page is able to slow down my Android phone to the point that it stops responding to power button click. If you told me this page exploits a 0-day vulnerability in Android I would have believed it. Impressive.
This. Unfortunately, I couldn't read beyond the first page, since it keeps crashing my (desktop) browser. Would be nice to have a button to stop/remove all animations in the page, so I could actually read the rest.
You could say it's a denial-of-service attack on your phone. But I guess it's not exactly a secret that misbehaving websites that you visit can slow down your phone.
(However it seems wrong that Android doesn't set up things via eg cgroups or whatever to make sure that the browser can't hog all the resources. You'd want to reserve say 5% of memory and RAM for use by system tasks perhaps? (Reserve in the sense that these system tasks can pre-empt anyone else using these, not that no one else can use these.))
I do think such mechanisms exist in various places. CPU intensive browser pages (e.g. running an infinite loop in JavaScript) would leave the page unresponsive, but the browser and the OS in general is still fine. You can easily close the page causing trouble. Well, at least in desktop Chrome and Firefox. I definitely haven't seen one page slowing the entire browser since the IE days. On the other hand, if a (general) Android app is not responsive, there is also a dialog inviting you to kill it ("[app name] isn't responding - Close app").
But this one is different. I don't know the underlying mechanisms for the browser and the OS, but it almost feels like a bug.
The opening statement makes it out that this exact calculation is supposed to be superior to multisampling, but the opposite is the case. Computing the exact mathematical coverage of a single polygon against a background is useless for animations if you can't seamlessly stitch multiple polygons together. And that's why GPUs use multisampling: Each sample is an exact mathematical point that's covered by either polygon at the seam, without the background bleeding through.
You might be making some incorrect assumptions about what this article is describing. It’s not limited to a single polygon against a background.
Analytic integration is always superior to multisampling, assuming the same choice of filter, and as long as the analytic integration is correct. Your comment is making an assumption that the analytic integration is incorrect in the presence of multiple polygons. This isn’t true though, the article is using multiple polygons, though the demo is limited in multiple ways for simplicity, it doesn’t appear to handle any arbitrary situation.
The limitations of the demo (whether it handles overlapping polygons, stitched meshes, textures, etc.) does not have any bearing on the conceptual point that computing the pixel analytically is better than taking multiple point samples. GPUs use multisampling because it’s easy and finite to compute, not because it’s higher quality. Multisampling is lower quality than analytic, but it’s far, far easier to productize, and it’s good enough for most things (especially games).
This is correct! My CPU implementation of this code can handle the overlapping polygons / meshes / textures. https://phetsims.github.io/alpenglow/#depthSort (takes forever to load, sorry!) is an example of using the analytic approach to render a phong-shaded teapot, where it splits things into adjacent but non-overlapping polygons (from a source of VERY overlapping polygons).
If you can't stitch polygons together seamlessly, how can you be sure the background doesn't bleed through with sampling? Isn't computing the exact coverage the same as having infinitely many point samples? The bleed-through of the background would then also be proportional to the gap between polygons, so if that one's small, the bleeding would be minor as well.
No, in animated models, there is no gap between polygons. And if you only compute single-polygon coverage, you can’t determine whether for two polygons that each cover 50% of a pixel, they both cover the same 50%, or complementary 50%, or anything in between. In practice, systems like that tend to show something like 25% background, 25% polygon A and 50% polygon B for the seam pixels, depending on draw order. That is, you get 25% background bleed.
Anybody has a link to a non interactive paper or article about this. It turns my smartphone into paperweight. I assume memory pressure or webgpu failing.
We do have a plan for conflation free compositing[1] which should closely approximate the quality of the samples here. That in turn depends on sparse strips[2], though a degraded performance experiment could be done to validate the quality outcomes. Sparse strips in turn depend on high performance segmented sort[3].
The analytic approach to path overlaps is intriguing, but I think it will be very challenging to implement efficiently on GPU. I'm looking forward to seeing what results.
[1]: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/Con...
[2]: https://docs.google.com/document/d/16dlcHvvLMumRa5MAyk2Du_Ms...
[3]: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/A.2...