Impressive work, Raph! As I'm sure everyone knows by now, the PostScript imaging model is hard to make work on the GPU, and its traditional geometry pipeline isn't suited well for curves or analytic coverage-based AA.
Since I come from the games space, a lot of my thoughts have been about baking as much data as possible: if we had infinite build time, or artist tooling akin to Maya, how would that help us develop better UIs? If we leave PostScript behind and think about new graphics models, what can we do? So a lot of my own research has been studying really old graphics history, since there's always fun ways of looking at the problem before we settled on one solution. I'll have to collect some of my research and demos and publish it sometime soon.
Although, one thing I've noticed during my time in games is that frame time is the standard. If we speed up our rasterizer, we'll just add more junk to the scene until it hits our 60fps frame target. More blurs! More effects! :)
> its traditional geometry pipeline isn't suited well for curves or analytic coverage-based AA
I actually don't agree with this: Pathfinder shows that analytic AA works quite well with the GPU rasterizer. You simply use a floating-point render target with additive blending to sum signed areas. In fact, as long as you convert to tiles first, which both piet-metal and Pathfinder do, vector rendering is actually a task very well suited for the GPU.
The hard part is that filling a path is a fundamentally sequential operation, since whether a pixel is on or off depends on every other path that intersects the scanline that pixel is on. This means that you either need an expensive sequential pass somewhere, or you lose work efficiency. Both piet-metal and Pathfinder try to strike a balance by doing some parts sequentially and sacrifice some work efficiency, using tiling to keep the efficiency loss bounded. This approach turns out to work well.
> The hard part is that filling a path is a fundamentally sequential operation, since whether a pixel is on or off depends on every other path that intersects the scanline that pixel is on.
I'm not quite sure I get this. Are you simply talking about overdraw optimizations and the requirement for in-order blending here, or something else like even-odd fill rules?
Obviously, the hardware itself has a massive serial component in the form of the ROP which will make sure blend draws retire in-order, and pixel-shader interlock gives you a bit more granular control over the scheduling without relying on the fixed-function ROP unit.
I'm talking about the fill rule (whether even-odd or winding). Given a path outline made of moveto/lineto/etc. commands, you can't tell just by looking locally at a pixel whether it should be filled or not without looking at every path segment that intersects the scanline it's on.
I have no idea what about the topic being discussed here, but this sounds like you basically have to do a:
for each path outline in all path outlines:
for each pixel in all pixels:
check whether pixel is inside/outside the path outline
?
That looks like a massively parallel problem to me: you can parallelize both for loops (all paths in parallel, all pixels in parallel) and then do a reduction for each pixel, which can be a parallel reduction.
Some acceleration data-structures for these kind of problems can also be constructed in parallel and in the GPU (e.g. bounded volume hierarchies), and some methods for inside/outside checking might be more amenable for parallelization than even/odd or winding number (e.g. level-set / signed-distance fields).
> That looks like a massively parallel problem to me: you can parallelize both for loops (all paths in parallel, all pixels in parallel) and then do a reduction for each pixel, which can be a parallel reduction.
The reduction step is still less work-efficient than the sequential algorithm (O(n log n) work vs. O(n)), even if it can be faster due to the increased parallelism.
But yeah, if you go down this road you will eventually end up with a delta coverage algorithm. You've basically described what I'd like to do in Pathfinder for an optional compute-based tiling mode. (Because I have to work on GL3, though, I can't depend on compute shader, so right now I do this work in parallel on CPU.)
> Some acceleration data-structures for these kind of problems can also be constructed in parallel and in the GPU (e.g. bounded volume hierarchies)
Yes, that's the tiling step that both piet-metal and Pathfinder do.
I think it is quite funny that 2D vector graphics, and 3D e.g. computational fluid dynamic simulations, have to, pretty much, solve the same problem.
We use in 3D STL geometrys, NURBS/T-splines from CAD, and signed-distance fields, often all of them simulatenously in the same simulation, and for a big 3D volume (with 10^9-10^10 "cells"/"3d-pixels") we have to figure out whether these are inside or outside. Our 3D domain is adaptive and dynamic to track the movement of bodies and features of the flow, so we have to update it on every iteration, and all of this has to happen in distributed memory on 100.000-1.000.000 cores, without blocking.
There is a lot of research about, e.g., how to update signed-distance fields quickly, in parallel, and distributed memory, when they slightly move or deform, as well as how to use signed distance fields to represent sharp corners or how to extract the input geometry "as accurately as possible" from a signed distance field, as well as how big the maximum error is, etc. The Journal of Computational Physics, Computational Methods in Applied Mechanics, and SIAM journals, are often full with this type of research.
For computer graphics, errors are typically ok. But for engineering applications, the difference between a "sharp" edge and a smoothed one can be a completely different flow field, which result in completely different physical phenomena (e.g. turbulent vs laminar flow), and completely different loads on a structure.
That's tessellation, and it is one approach to vector graphics on GPU. Unfortunately, it's difficult to implement (floating point/fixed point error will frequently ruin your day) and makes it hard to implement analytic antialiasing, which is why Pathfinder doesn't use it. But it is a possible approach.
These are all interesting questions. I've been following some of the stuff on Shadertoy, and I think that points the way. Most is 3d, but not all, and in particular there's a lot to be done with distance fields.
I've been thinking about blur (and mentioned some approximation techniques), but it's expensive in any context. Maybe if we think in shaders instead of the blur tool, we'll find a visual palette that is both fresh and efficient.
Distance fields are great! What do you think an alternative to PostScript's imaging model (Porter/Duff) formulated in terms of signed distance fields would be like?
Much of Photoshop's power comes from representing the selection as first class alpha channels that you can use all of Photoshop's editing and filtering tools on, so you can select something with any of the the selection tools, then take a step back into "quick mask" mode and edit your selection in the "channel" domain, with any of the painting tools or filters, and store selections in extra channels for later use.
I wonder how an image editor like Photoshop (and a rendering model in general) could directly and conveniently support signed distance fields as shapes, selections, and channels, as nicely as it supports alpha channels?
Back in 1993 (5 years after Photoshop was created in 1988), it sounded revolutionary to use texture mapping as a fundamental drawing primitive, because nobody suspected some day everybody'd be carrying around more computrons than an SGI workstation in their phone.
I don't quite know what you mean. I don't think Raph is talking about SDF in the same way that people often talk about it, as a way to use the sampling hardware to improve alpha-tested magnification for fonts and other monochrome shapes. Rather he's talking about computing the distance to a stroke in a shader and using that to approximate a coverage value. You can view that as a type of distance field, and it is, technically, but it's not using the texture mapping hardware as for example the Valve paper does. An image editor that rendered with distance fields in the sense of this article would basically just be a regular vector editor, like Illustrator.
I was thinking that it would be lovely to be able to do the same special effect tricks (outlines, shadows, dialating, glows, bevels, shading, etc) with anything you drew like points, lines, curves, polygons, etc, that (for example) TextMesh Pro can do with text.
There has been research how to do that with adaptively sampled distance fields [1].
The author (Frisken) also created a 2D drawing tool called Mischief, which is an insanely fast vector graphics software. It's available in the Mac App Store if you want to try it. I think Foundry owns the patents now and they are using the technology in their products.
> Well known to game developers, what is efficient on GPU is a structure-of-arrays approach. In particular, the tiling phase spends a lot of time looking at bounding boxes, to decide what belongs in each tile. [...] Now we get to the heart of the algorithm: going through an array of bounding boxes, looking for those that intersect a subset of tiles.
Raph, you're describing ray tracing! I haven't thought a lot about this, but maybe the ray tracing hardware trend really could very much be utilized for 2D UI. With ray tracing support now announced all over the place, are you already thinking along those lines?
The main difference between rasterizing and ray tracing is at it's most basic SoA vs AoS, i.e., whether the outer loop is over triangles or over pixels. And the pixel operation is going through an array of bounding boxes to see what overlaps.
The somewhat amazing thing about the ray tracing hardware is you get in effect your entire search query through all the bounding boxes in a single "instruction". It's such a departure from all the predictable bite-sized hardware operations we're used to. It reminds me of stories from college of the DEC VAX and it's single instruction memcpy & strcmp.
> While I mostly focused on parallel read access, I’m also intrigued by the possibility of generating the scene graph in parallel, which obviously means doing allocations in a multithread-friendly way.
FWIW, this is definitely doable and already happening on the 3d GPU ray tracing side of the world...! Parallel scene graph construction, parallel BVH builds, etc.
> Flutter is a good modern approach to this, and its “layers” are one of the keys to its performance.
Flutter actually does it badly. Layers are terrible for performance because they are a source of jitter (and memory usage), and jitter often results in jank. They are necessary in some cases, like for correct alpha blending, but in general should be avoided if your goal is to be smooth & fast. You want to optimize for your slowest frames, not your fastest ones.
Flutter's excessive layering can also rapidly be a net-loss in performance due to the overhead of render target switching, increase in memory bandwidth required, and fewer opportunities for overdraw avoidance & disabled blending.
Agree on both fronts! (Also, your username seems familiar somehow - have we had discussions around this before? :)
I didn't talk about this in the post because I haven't actually implemented fancy compositing, but based on what I've seen you never want to render to texture (which Flutter does sometimes) as it uses up scarce global GPU memory bandwidth, and if you're doing that using heuristics, then performance gets very unpredictable. I should express this more clearly, but the goal in piet-metal is not to do the fastest renderer (I'm sure it isn't), but the one with the most consistent and predictable performance.
> but based on what I've seen you never want to render to texture (which Flutter does sometimes)
Flutter's "layers" are rasterized to texture if they are unchanged for N frames (I think N = 3?). Which happens kind of a lot, and is (one of) the reasons Flutter has such a big RAM footprint. Flutter's layers remind me a lot of the hacky, terrible will-change CSS property.
Other than that it's just a rather crude way to hack up the rendering commands into smaller display lists. Android has been doing this for ages with RenderNode with both display lists & transform properties on those display lists.
Yes, macOS relies very heavily on CALayer to accomplish smooth scrolling, largely because much of the content of those layers is software-rendered. One thing that I find interesting is the subtle effect this has on the aesthetics of macOS (and iOS also) - lots of semitransparent planes sliding over each other and fading, very few other things that would be expensive in that model.
I think this is a bad approach on modern hardware, for the reasons kllrnohj stated.
Well, in macOS (and Windows) the layers are explicitly created by the programmer: they aren't implicitly created based on heuristics. That's a significant difference.
The Web has historically followed Flutter here in having a complex set of "layerization" heuristics. One of the main motivations of WebRender is to change that.
Patrick is of course right here, macOS isn't quite like Flutter because it basically always texturizes, although ultimately this is under control of the programmers.
And perhaps it would be more accurate to say Flutter follows the Web, as many of the team are former Chrome engineers, though I think they've tried to clean up a lot of the warty stuff.
And absolutely, the idea of relying less on the compositor is not original to me, WebRender has similar goals, though I think I can make things performant, rich, and fairly simple by leveraging compute.
In Vulkan, buffer age is implicit with the FIFO swap-chain model. Agreed that there's currently no SwapWithDamage equivalent to tell the compositor about the damaged region in Vulkan, but if it's wanted, it shouldn't be too difficult of an extension to draft or support. I'm open to throwing it in the queue if wanted. What swapchain platforms were you thinking of it for?
> In Vulkan, buffer age is implicit with the FIFO swap-chain model.
I know, but it's easier to explain the why with something proposing inclusion of the feature specifically, hence the link to EGL extension instead of Vulkan spec.
Although there's differences in whether or not the "redrawn" area of the buffer can be sampled from or not. I don't remember where Vulkan landed on that. I haven't read the spec but I'd assume it has buffer age similar to EGL_KHR_partial_update (content outside the drawn area is preserved, content within the drawn area is not) rather than like EGL_EXT_buffer_age (everything is preserved). Otherwise it'd force an initial copy into the tiler, which would be an odd default behavior.
> Agreed that there's currently no SwapWithDamage equivalent to tell the compositor about the damaged region in Vulkan, it shouldn't be too difficult of an extension to draft or support.
If you're talking about improving support for incremental present in Vulkan implementations, let's definitely connect. Certainly I can see how to do it well on Windows/DirectX, using IDXGISwapChain1::Present1. On macOS I think it's impossible (though Chrome sometimes fakes it by pasting update layers on top of the base window in the compositor).
For incremental buffer updates you just know the content of the new VkImage you're going to draw into - it's the same thing it was the last time you present. So you keep a ringbuffer of dirt rects for the last N frames, and then when you acquire the next frame you just intersect the last N frames of dirty and that's the area of the buffer to update.
This is "standard" vulkan, there's no platform aspect to this. It has very widespread support in EGL/GL as well with either egl_ext_buffer_age extension or, more commonly, the EGL_KHR_partial_update extension.
If you then want to limit the area of the screen that's re-composited THAT'S when swapchain extensions or platform support enters the picture. That would be eglSwapBuffersWithDamage or VkPresentRegionKHR. But this optimization has very little performance impact, and typically minimal battery life improvement as well. Worth doing, but the partial re-render of the buffer is far more significant.
> though Chrome sometimes fakes it by pasting update layers on top of the base window in the compositor
For a long time that was also just how chrome handled display list updates. They just had a stack of them, and re-rendering a part of the screen just created an entirely new displaylist and plopped it on top: https://www.chromium.org/developers/design-documents/impl-si... (see "PicturePile")
I wouldn't necessarily follow Chrome as an example of what to do - there's a lot of legacy in Chrome's rendering stack. Also web pages are huge, so they do things like put rendering commands in an rtree because quick rejects happen 100x more often than actual draws.
Something I've been really surprised and disappointed by is the lack of 2d vector graphics support in modern gaming engines.
I'm an amateur enthusiast but I spent hours and hours unsuccessfully trying to find any decent engine where I could make games using svg or other vector formats.
I know what you mean! Unity3D has no way to simply draw a circle or pie chart into a texture.
It would be great to be able to use the canvas api (and any JavaScript library like d3 that uses canvas) to dynamically draw Unity3D textures.
To address that problem (and others), I'm been developing a Unity3D extension called UnityJS, which integrates a web browser and JavaScript with Unity3D, so you can use the canvas api to draw images for use as 2D user interface overlays and 3D mesh textures in Unity. And of course the other (main) point of UnityJS is to use JavaScript for scripting and debugging Unity apps, and integrate Unity apps with off-the-shelf JavaScript web libraries (d3, socket.io, whatever).
Drawing on canvases actually works pretty well in the WebGL build, because the JavaScript runs in the same browser tab and address space that's running the Unity app, so you can efficiently blit the canvas's pixels right into Unity's texture memory.
(Although I don't believe there's a way to share the canvas texture directly through GPU memory, or even directly share Unity's ArrayBuffer memory, but it's fast enough for interactive user interface stuff, and millions of times better than the "portable" technique of writing out a data uri with a base 64 encoded compressed png file, which is the "standard" way of getting pixels out of the web browser! Web tech sure sucks!)
But drawing that circle (or a pie chart) into a texture (so you don't have to draw each triangle every frame) is a whole can of worms.
It might be fine in some cases on a big honking workstation, but reducing the number of draws and triangles becomes quite important when you're doing mobile user interfaces or VR.
Also, it would take a hell of a lot of programming to duplicate the canvas api with LineRenderer and dynamic meshes.
And then you've rolled yourself a custom C# drawing API, so there aren't any full featured well supported off-the-shelf libraries like d3 or chartjs to drive it.
One of the nice things about the way UnityJS integrates JavaScript with a standard web browser in Unity apps is that you can use any of the enormous ecosystem of existing off-the-shelf JavaScript libraries without modification (both visual canvas drawing, and JSON based data wrangling libraries and APIs for services).
It's all about leveraging existing JavaScript libraries and web technologies, instead of reinventing half-assed Unity C# versions.
Given any JSON based web service or SDK you might want to talk to (google sheets or slack for example), the chances of finding a working well supported JavaScript library that talks to it are a lot higher than finding an equivalent C# library that runs in Unity. (Socket.io is a good example: there exist some for Unity, but they all have problems and limitations, and none of them compare in quality and support to the standard JavaScript socket.io library.)
You've described exactly why I want to integrate Pathfinder into Unity [1]. :) Pathfinder offers a subset of HTML canvas that I would like to expand over the next few weeks and months into the full API. (Ideally it will actually be the HTML canvas implementation for Firefox someday.)
There's also the benefit that Pathfinder can transform the canvas in 3D while keeping the content in vector form (i.e. no quality loss), which is important for VR.
> integrates a web browser and JavaScript with Unity3D, so you can use the canvas api to draw images for use as 2D user interface overlays and 3D mesh textures in Unity.
Great use of resources, you could also use ogv.js for animated particle textures :-P
I would love to see someone step up and add Pathfinder support to Unity, Godot, and other popular engines. I'd be very interested in taking pull requests to this effect :)
> it seems quite viable to implement 2D rendering directly on GPU, with very promising quality and performance.
I guess all agree that GPU rendering is the future.
Yet, I'm still in favor for libraries that abstracts the actual renderer so it could fallback if needed.
Especially when future and past conflicts.
I come from audio development where people use plug-ins (dynamic libs) that loads on a host that provides native OS window you compose.
Cross-platform is pretty vital so there are frameworks to target macOS, Windows, iOS (and sometimes Android & Linux).
Here is a real life scenario that many audio developers are currently at:
- you had a product with UI that needed to show 'real-time' visualization of signal - Analyzer (let's say FFT).
- eventually you went the OpenGL way (because it was the right way in 2010...) you've written shaders.
- it worked great on macOS and ok on most Windows machines (some old machines had limited or broken OpenGL within their drivers).
- Apple deprecates OpenGL in favor of Metal...
So now what was "modern" 9 years ago requires compete re-write. if there was an abstraction layer it might be minor changes and switching to new Metal renderer instead of the OpenGL one.
In comparison,
The native/cpu C++ code for the same product proven to be much more future proof and only required simple maintenance.
It is indeed important work. but I guess targeting Vulkan is better especially when there are libs like MoltenVK.
I guess the question is how much portability is important factor vs optimal performance.
I've been through the same iterations you talked about, writing game graphics in MacOS Classic QuickDraw, then OS X Cocoa, then OpenGL, then ES2, now Unity...
IMHO this is all going the wrong direction. I'm strongly against SIMD as a pattern because it makes things hard to generalize. The fact that we don't even know the architectures of the video cards we use every day is a huge red flag. I mean, that's the point we're trying to get to with abstractions, but SIMD exacerbated the pain by being so opinionated.
The reason all of this is going in such strange (informal) directions is that video cards are proprietary and the money is in photorealistic rasterization. It's working along one branch of the tree of possible architectures. Then CUDA and TensorFlow etc are attempts to overlay other narrow generalizations over that and IMHO aren't nearly as good of approaches as MATLAB/Octave/Erlang/Go, etc.
I've said this many times, but I'd prefer to have an array of general purpose CPUs, at least 256 and their number should roughly double in a predictable way every year or two. Then we could have abstractions over that that handle data locality or cache coherency (if needed... it most likely isn't). A chip like that would be trivial to use for ray tracing, for example.
We're going to continue seeing this uninspired churn burning out developers until there is a viable high-core CPU to compete with GPUs.
> what was "modern" 9 years ago requires compete re-write.
Have you thought about porting to GLES (shouldn't be that hard if you already have GL shaders), and using MoltenGL (apple) and ANGLE (windows)? AFAIK they both work reasonably well in practice.
> I guess targeting Vulkan is better
Vulkan support is improving but I’m not sure it’s ready yet. For Intel GPUs, only supported on Windows starting from Skylake (2015).
> the question is how much portability is important factor vs optimal performance
I think portable and fast cross-platform GUI is possible, just expensive to implement and especially support.
Commercial game engines like Unity or Unreal optimize both at the same time, and they support low-spec platforms like mobile phones.
When Microsoft wanted fast rich GUI portable across their platforms, they’ve built XAML.
When Google wanted the same, they acquired Skia and built Chromium and now Flutter on top.
What an interesting rabbit hole. Thanks for sharing your notes too!
Re: > "As a digression, I find it amusing that the word for packing a data structure into a byte buffer is “serialization” even when it’s designed to be accessed in parallel. Maybe we should come up with a better term, as “parallel-friendly serialization” is an oxymoron."
NeXTStep/Cocoa calls this simply "encoding" which always made sense to me.
>Other than that, the serialization format is not that exotic, broadly similar to FlatBuffers or Cap’n Proto. As a digression, I find it amusing that the word for packing a data structure into a byte buffer is “serialization” even when it’s designed to be accessed in parallel. Maybe we should come up with a better term, as “parallel-friendly serialization” is an oxymoron.
Great observation! I get the feeling there must be a great punny term for the enigmatic oxymoron "parallel-friendly serialization", but I can't quite put my finger on it.
(Reminds me of "de-optimizer" => "pessimizer", or "spectral frequency analysis and filtering" => "cepstral quefrency alanysis and liftering".)
Marhsalling has a precedent in some languages. It makes a lot of sense. "serde[s]" always reminds me of sequencing of transmission more than a format concern of what goes where in a buffer.
> I’m particularly interested in the rendering quality.
Shouldn't the rendering be done using oversampling? E.g. instead of 100x100, you render on 400x400, then scale back down to 100x100 to get the best result.
One of the problems is that anti-aliasing creates pixels which are half-on, e.g. at the edge of a line or polygon. Now if multiple edges meet near a certain pixel, then rendering the polygon(s) will touch a "half-on" pixel several times, and as a result it may receive the wrong opacity.
This (and its optimized cousin MSAA) has a huge performance and bandwidth cost relative to its quality increase. It fixes conflation on shared edges (the problem you describe) but 16x supersampling only gives you 16 possible opacities to work with for antialiased edges.
The analytic approach taken here and by Pathfinder gets to use all 256 possible opacities, and uses less bandwidth than supersampling, but has conflation artifacts. To match that with supersampling you'd need to scale from 100x100 to 1600x1600. Because of this most renderers also have the conflation problem (try it in your browser with an SVG), so a lot of content in the wild works around it.
As far as I know, you basically get to choose two of antialiasing quality, correctness, and performance. The viable antialiasing options for vectors are:
1. Supersampling/multisampling. This is what you suggested. This provides correctness with coincident edges, but 256xAA (what analytic AA gives you) is far too slow and memory intensive to be practical if you supersample. In general practical implementations of MSAA/SSAA are limited to 16xAA or so, which is a noticeable drop in quality and in fact is still pretty slow relative to a high-quality implementation of analytic AA.
2. Analytic antialiasing. This has problems with coincident edges, but in most cases the effects are minimal. The performance and quality are excellent if implemented properly.
I generally think that analytic AA is the right tradeoff for most applications. Subtle rendering problems on coincident edges are usually a small price to pay for the excellent results, and designers can work around those issues when they come up. A 100% theoretically correct rasterizer can't exist anyway, because of floating point/fixed point precision issues among other reasons.
Interesting, this is new for me. Is it true that for analytic AA you'd have to consider the rendered objects all at once, instead of just rendering one after the other?
Usually with analytic AA you just use Porter-Duff blending, because rendering all the objects simultaneously doesn't buy you much except for performance (which is why piet-metal does it that way). For supersampling AA you can render all the objects at once, like Flash did, and this does improve rendering quality around coincident edges.
Oddly (in my opinion), some rasterizers such as Skia's use supersampling for individual paths and Porter-Duff blending when overlaying different paths on top of one another. This means that you don't have seams between subpaths of the same path, but different paths can have seams on coincident edges. I don't like this tradeoff personally, because it has a lot of the downsides of analytic AA without the benefit of antialiasing quality, but for better or worse this approach is widely used.
Ok. I personally think a 100% correct rendering would have great benefit. The coincident edge problem happens a lot, especially when the scene is computer-generated. Also when overlaying the same object a number of times exactly, the edges lose their AA property. (E.g. in Inkscape, draw a circle, copy it, then paste it exactly on top using ctrl+shift+V and repeat 20 times, and you see the edges becoming jagged).
It is true that is more correct (see https://github.com/pcwalton/pathfinder/issues/142). But my motivation is much more user interface than document rendering, so it's quite reasonable to tell the UI designers, "don't do that," especially if the flip side is butter-smooth performance.
Agreed, and there are some deep issues, for example the rendering of very thin strokes which is quite dependent on resolution. Also keep in mind that the 4x4 oversampling you mentioned does not give nice smooth edges. I'm working on font editing, so the quality and smoothness of the line, to most accurately represent the shape without artifacts, is paramount.
As someone who knows literally nothing about the GPU, the notion that the GPU might not be good for UI graphics seems...odd? Like, it can handle 3D graphics right? Aren't those waaaay more computationally expensive than 2D graphics?
I know that I'm missing something important, but I don't know what it is.
This is a blog post I've been meaning to write for a while now, because it's definitely surprising to learn. But here's the key points:
GPUs are good at triangles. Triangles are very simple -- three planar equations tells you if you are inside or outside the shape. 2D graphics is typically specified in overlapping curves. Determining whether a pixel should be filled or not is a lot trickier -- even determining which side of a Bezier curve you are on is a non-obvious and numerically unstable problem (nitpickers corner: I know about Citardauq and Slug's root finding approximations).
Part of this is a tooling and culture problem. There's a tradition and expectation in 3D to take higher-order implicit surfaces like Catmull-Clark and "bake" them down to triangles, and part of the skillset of a 3D artist is to manage this transition. 2D has no such expectation -- the API is "curveTo" and it's required realtime. And this API is not only assumed in platforms, but also TrueType fonts, and SVG graphics.
There's also some other differences -- in 3D graphics, you can be imprecise with your antialiasing because the viewport constantly shifting with perspective can mask a lot of ugliness, and typical scenes will have a lot of detail and texture. In 2D, the standard is clean silhouettes (which most post-processing AA techniques do not work well on), and stability of small shapes and details (you don't want your letters to bubble and jitter around as you scroll a page or type, like you might see from a temporal antialiasing algorithm).
Ultimately, 3D scenes are constructed mostly with materials and textures -- fancy things inside the shapes, but 2D is the opposite -- very simple solid color or gradient fills on complex shapes.
It's complicated. 2D graphics aren't inherently harder or require more computation than 3D, but they require very different optimizations. I think it boils down to the fact that 2D graphics is ultimately data structure heavy, whereas 3D graphics can often be represented as a huge array of triangles. So there's a lot of literature (some of which I linked, then those papers have a deep literature review), but no one obviously best solution. One thing I think I contributed is a relatively simple solution, especially for doing almost all the work on GPU rather than relying on CPU for things.
Is there any particular reason why recent / future GPUs would be worse at 2D graphics than older versions? I.e., what needs to change, when current methods for 2D graphics have worked for years?
Also not knowing much about the field, GPUs seem to have a lot of unpleasantness in the form of incompatible and/or proprietary APIs, so implementing anything that's supposed to be cross platform (also to old and new GPUs, and future GPUs as they are released) may be hard work.
What makes you say that 2D graphics has worked well? Scrolling often janks, UI designers constantly have to work around limitations in the imaging model and performance quirks (for a long time, clipping to a rounded rect would kill performance), and there were lots of shortcuts with things like gamma for blending. I'm hoping to build something way better, where the UI always updates smoothly, and the UI designer doesn't have to be conscious of the limitations in the renderer.
Are you talking about a specific project you're working on? I'm currently building a UI tool for designers, and I'm extremely interested to hear about any work in that area.
Yes. I'll have more to say soon, but I'm basically building what I think is next-gen UI infrastructure in Rust. That's a project with perhaps overly ambitious scope, so I'm starting with one specific app, namely a font editor. That will be a great showcase for beautifully antialiased vector paths and smooth-as-silk scrolling and interaction, I think.
> I.e., what needs to change, when current methods for 2D graphics have worked for years?
Very nearly every 2D graphics system uses software rasterization for handling generic paths. Many 2D graphics renderers are even just mostly software, using limited GPU acceleration for just basic blitting & blending of textures.
What changed is mobile happened, radically shifting the resolution vs. CPU performance balance, along with decent GPUs being suddenly a basic feature. Getting generic vector path performance up to par in this world to be a basic building block is now interesting.
I've been also working on a 2D UI on the GPU. It's quite amazing actually, compared with the code running on the CPU, the code is very terse. The setup can be somewhat verbose, but the shaders themselves are short. Like a 20 line shader can do miracles.
> It’s often said that GPU is bad at data structures, but I’d turn that around. Most, but not all, data structures are bad at GPU. An extreme example is a linked list, which is still considered reasonable on CPU, and is the backbone of many popular data structures. Not only does it force sequential access, but it also doesn’t hide the latency of global memory access, which can be as high as 1029 cycles on a modern GPU such as Volta.
You're almost correct here, from my understanding. But... I still feel like its important to note that Pointer-jumping exists, and linked-list like structures (linked lists, trees, and graphs) can be traversed in parallel in many cases.
For pointer-jumping in GPUs / SIMD systems, I know that "Data Parallel Algorithms" discusses the technique as applied to a "reduce" operation aka a "scan": https://dl.acm.org/citation.cfm?id=7903
The requirement (not fully discussed in the 'Data Parallel Algorithms' article) is that you still need an array of nodes to be listed off by the SIMD system. But this array of nodes does NOT need to be sorted.
As such, you can accomplish a "GPU Reduce" over a linked list in O(lg(n)) steps (assuming an infinite-core machine). This is way slower than the O(lg(n)) array traversal, but you're at least within the same asymptotic class.
> Performant UI must use GPU effectively, and it’s increasingly common to write UI directly in terms of GPU rendering, without a 2D graphics API as in the intermediate layer.
Y tho? I'm just a passerby in this topic, but I see this sentiment now and again these days, and I don't understand what happened. I don't remember hearing complaints of DirectDraw being too slow, for example.
DirectDraw wasn't a 2D graphics package; you probably mean GDI/GDI+. These libraries were used at at much smaller resolutions (1024x768, but fullscreen was pushing it, your window was smaller). GDI could roughly be used at 60Hz but it didn't support antialiasing or transparency, you needed GDI+ for that. And GDI+ was much slower, even at the smaller resolutions, so you tended to draw into a big bitmap ahead of time and scroll that. Animations in GDI+ were unheard of.
I'm not sure what you mean by '2D graphics package', so it may be a semantics thing here, but DirectDraw was a 2D graphics API on top of DirectX. I only used it a bit and didn't find that for my uses, the added complexity was worth it over just using GDI, but DirectDraw did do GPU accelerated 2D rendering. How fast it was in light of today's requirements and hardware, I don't know.
Are you confusing it with Direct2D? DirectDraw was an old API from the D3D5-ish era and basically allocated a front buffer you could draw to with GDI. It was not GPU accelerated.
Well you got me wondering now on what I worked with, but https://en.wikipedia.org/wiki/DirectDraw seems to say DirectDraw was hardware-accelerated too. I guess the point is that there are and have been for a long time GPU accelerated 2D API's in DirectX land.
Very interesting work. I’m wondering if you’ve ever worked with vector tiles? Mostly it’s used for drawing large numbers of vectors on maps, but has a similar architecture and approach for 2D drawing on the GPU via an efficient command-based language based on the extent of the tile.
I don't get it. 2D is a subset of 3D which the GPU already does. Why the need for tiles and whatnot why not render 2D text and other bezier curves as 3D curves with a z coordinate of 0 projected in orthographic perspective ?
The short answer is that 3D is all about triangles, and that's not a good fit for the PostScript-based 2D rasterization model used by most APIs and tools (SVG, Illustrator, Apple CoreGraphics, Sketch...)
See the answers to this question in the same thread:
I know. I was eager to get this out. I'll be working on more visuals for a Libre Graphics Meeting presentation, where I'll include some of this in the Rust 2D graphics talk.
Since I come from the games space, a lot of my thoughts have been about baking as much data as possible: if we had infinite build time, or artist tooling akin to Maya, how would that help us develop better UIs? If we leave PostScript behind and think about new graphics models, what can we do? So a lot of my own research has been studying really old graphics history, since there's always fun ways of looking at the problem before we settled on one solution. I'll have to collect some of my research and demos and publish it sometime soon.
Although, one thing I've noticed during my time in games is that frame time is the standard. If we speed up our rasterizer, we'll just add more junk to the scene until it hits our 60fps frame target. More blurs! More effects! :)