I love to see more work in this space. It's clear that GPU compute is the future...

muyyatin2 · 2024-08-15T20:22:50 1723753370

The analytic approach to occlusion definitely does seem like a "humbling parallelism" type of problem on the GPU. My curiosity is leading me to explore it, and it may be reasonable if I find alternatives to large GPU sorts (although I understand you've done some work on that recently). I think the Vello approach is very likely the superior option for the best general quality/performance tradeoff.

raphlinus · 2024-08-15T21:24:38 1723757078

Ah, you watched my "good parallel computer" talk[﹡] :)

If I were going to take it on, I'd start with BVH construction - the H-PLOC paper at the latest HPG [1] looks promising - then traverse down the hierarchy until you get very small number of path segments so you can pairwise compare them. Obviously any time there is an intersection you need at least the two segments.

This seems hard to me, humbling even. I mean, overlap removal is hard enough on the CPU, especially because it's so sensitive to numerical robustness, and doubly so for curves. But I think you'll learn something for trying!

[﹡] https://news.ycombinator.com/item?id=41105102 but it didn't make the front page; I'm holding back promoting it further pending writing a companion blog post.

[1]: https://gpuopen.com/download/publications/HPLOC.pdf

muyyatin2 · 2024-08-15T21:49:27 1723758567

I'm curious what your thoughts are on my approach for robustness.

I've written similar operations[1] that include elliptical arcs AND Bézier curves, and robustness has been an issue. An assortment of polygon clipping approaches use ONLY line segments and fixed precision numbers to work around this.

If I discretize the line segments to 20bits (fixed), potentially in tiles (to reduce inaccuracies), then I can represent exact rational intersections (and parametric t values) with 64-bit numerators and 64-bit denominators[2].

This significantly concerns me about computation cost (and memory if I need to store them), but using this scheme to ensure absolute correctness (and ordering) of intersections does seem to work in CPU proof-of-concepts. Perhaps it is possible to only use this expensive method to disambiguate if it cannot be done from floating-point numbers.

My initial GPU concept imagined a sorting of intersected segments, however a radix sort over a 128-bit object seems like a non-starter, and if I try that, a merge sort may still be viable?

Thank you for the links and recommendations!

[1]: https://github.com/phetsims/kite

[2]: https://github.com/phetsims/alpenglow/blob/main/js/webgpu/wg...