I love to see more work in this space. It's clear that GPU compute is the future of 2D rendering, and we need to explore a bunch of different approaches to find the best one. I especially appreciate the focus on rendering quality; the author is absolutely correct that the current state of Vello has conflation artifacts and does not do antialiasing in the correct (linear) colorspace.
We do have a plan for conflation free compositing[1] which should closely approximate the quality of the samples here. That in turn depends on sparse strips[2], though a degraded performance experiment could be done to validate the quality outcomes. Sparse strips in turn depend on high performance segmented sort[3].
The analytic approach to path overlaps is intriguing, but I think it will be very challenging to implement efficiently on GPU. I'm looking forward to seeing what results.
The analytic approach to occlusion definitely does seem like a "humbling parallelism" type of problem on the GPU. My curiosity is leading me to explore it, and it may be reasonable if I find alternatives to large GPU sorts (although I understand you've done some work on that recently). I think the Vello approach is very likely the superior option for the best general quality/performance tradeoff.
Ah, you watched my "good parallel computer" talk[﹡] :)
If I were going to take it on, I'd start with BVH construction - the H-PLOC paper at the latest HPG [1] looks promising - then traverse down the hierarchy until you get very small number of path segments so you can pairwise compare them. Obviously any time there is an intersection you need at least the two segments.
This seems hard to me, humbling even. I mean, overlap removal is hard enough on the CPU, especially because it's so sensitive to numerical robustness, and doubly so for curves. But I think you'll learn something for trying!
I'm curious what your thoughts are on my approach for robustness.
I've written similar operations[1] that include elliptical arcs AND Bézier curves, and robustness has been an issue. An assortment of polygon clipping approaches use ONLY line segments and fixed precision numbers to work around this.
If I discretize the line segments to 20bits (fixed), potentially in tiles (to reduce inaccuracies), then I can represent exact rational intersections (and parametric t values) with 64-bit numerators and 64-bit denominators[2].
This significantly concerns me about computation cost (and memory if I need to store them), but using this scheme to ensure absolute correctness (and ordering) of intersections does seem to work in CPU proof-of-concepts. Perhaps it is possible to only use this expensive method to disambiguate if it cannot be done from floating-point numbers.
My initial GPU concept imagined a sorting of intersected segments, however a radix sort over a 128-bit object seems like a non-starter, and if I try that, a merge sort may still be viable?
We do have a plan for conflation free compositing[1] which should closely approximate the quality of the samples here. That in turn depends on sparse strips[2], though a degraded performance experiment could be done to validate the quality outcomes. Sparse strips in turn depend on high performance segmented sort[3].
The analytic approach to path overlaps is intriguing, but I think it will be very challenging to implement efficiently on GPU. I'm looking forward to seeing what results.
[1]: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/Con...
[2]: https://docs.google.com/document/d/16dlcHvvLMumRa5MAyk2Du_Ms...
[3]: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/A.2...