Can't the font renderer render a straight line? Like sans serif "I”? The font ra...

jchw · on March 25, 2022

The font rasterizer is a massive hack in modern UIs. Subpixel rendering is a serious pain in the ass. When you render text using subpixel rendering, you render the actual vectors at 3x the spatial resolution. But, not simply as if the vectors were 3x wider, because that would look too sharp: it needs to render as if there was 3x as many pixels, which is different.

Then there’s compositing. Normal layers can be composited using alpha blending, assuming some sane format like premultiplied alpha RGBA. But not subpixel rendered text, because alpha blending the components will fuck up the subpixel rendering.

And it goes on, because if you want to handle text like everything else, you need special cases for it to look right. Rotation? Need to render the vectors rotated; can’t rotate in raster. If you need to render to a surface then transform that surface, you’re SOL; it can’t go to rasters until the end.

Normal surfaces can also be rendered at subpixel positions, and of course this does not work for surfaces containing text, because again, it will destroy the subpixel rendering.

OK. So you can get rid of the subpixel rendering and render slightly blurrier glyphs instead. (R.I.P. anyone trying to tell hanzi/kanji apart.) It’s still going to murder legibility if you move it over by a subpixel value because text is already on the edge of readability at 96 DPI.

I haven’t considered gamma correction, hinting, blending different colors, different blending modes, GPU acceleration, etc. because I simply don’t have the brain power to try to reconcile it all. It’s a nightmare.

We already did some of this for text. Which is a herculean effort. We use a freakin virtual machine to power font hinting, and ugly, complex, slow special casing at many layers of already ridiculously complex vector graphics stacks (I mean if you disagree with that assessment, you may just be smarter than I am, but I have serious trouble following the Skia codebase and I doubt Cairo is really that much better.) And speaking of which, there only really seems to be a handful of them out there: there’s Skia, used by most web browsers; Cairo, used by GTK; Direct2D, in Windows; Whatever modern macOS uses that isn’t QuickDraw anymore; and I guess there’s Mozilla’s pathfinder, a promising Rust-based vector graphics engine that was built as part of Servo and seemingly mostly abandoned, much to the world’s detriment. This work is hard. It can be done, but it’s not something I think a single engineer can do, if you want to build one that competes with the big boys even disregarding a few things like performance. I’d love to be wrong, but I have a sinking feeling I’m not.

Even text isn’t done being overcomplicated. As nyanpasu has mentioned above, some software have started implementing SDFs for font scaling. We do this because text legibility is really that important, whereas a line in the UI being slightly blurry for users on older screens is really just not that important. Some languages flat out can’t be read with crappy font rendering, and any of them will give you eyestrain if it’s ugly enough. As much as it sucks, a blurry border on a button doesn’t have an accessibility issue. And rendering at 1x and making the compositor upscale is not a great solution either because again, it’s already hard enough to read text in some languages; the added blurriness of scaling text and ruining subpixels is basically intolerable.

These hacks aren’t free, and with high DPI displays, they’re not needed. There’s a reason Apple did what they did.

hedora · on March 25, 2022

OK, but there's clearly an existence proof, and it ran fine on 32 bit machines with slow processors (or even embedded CPUs in the 80's!) way before all the piled hacks you are describing were invented.

As I understand it, all that's needed is a vector renderer, and you keep everything (even text) in vector format as long as possible. RGBA then becomes a special case, as it must be for any DPI independent rendering pipeline.

Trying to compose rendered vectors using pixel based operations is madness, so... don't?

That means you can't have a bitmap-based compositor. So what? GPU's are great at rendering vectors. Composite those instead of bitmaps.

Or, just don't composite at all. A decade later, Linux desktop compositors are still an ergonomic regression vs. existing display drivers with vsync and double buffering support.

jchw · on March 25, 2022

> OK, but there's clearly an existence proof, and it ran fine on 32 bit machines with slow processors (or even embedded CPUs in the 80's!) way before all the piled hacks you are describing were invented.

Yes. Driving ~1024x768 framebuffers, on single core processors, with far less demanding workloads, but still, yes. (They still badly needed good glyph caching to accomplish this.) (I’m assuming a Windows XP-tier machine since that was the era most people started using ClearType/subpixel rendering.)

(Single core processors are obviously slower than multicore processors, all else equals, but exploiting multi-core processors effectively is harder and often leads to code that is at least a bit slower in the single-core case…)

> As I understand it, all that's needed is a vector renderer, and you keep everything (even text) in vector format as long as possible. RGBA then becomes a special case, as it must be for any DPI independent rendering pipeline.

I don’t want to sound like I’m being patronizing, but I get the feeling that you may not be grasping the problem.

We can’t just use text rendering logic to power other vector graphics. For many reasons. Text is not just rendered like vectors, as that would simply be too blurry at 96 DPI. Old computers used bitmap fonts or aggressive hinting, and newer computers use anti-aliasing, often with subpixel anti-aliasing. Doing that with every line on screen isn’t feasible even if you wanted to write the code. Here’s an attempt to enumerate just the obvious reasons why:

- It’s slow. Yes, old 32 bit computers could do it, yadda yadda ya. But they did it for text. At the glyph level. And then cached it. They were most certainly not rendering anything near the entire size of the framebuffer at once.

- It’s difficult to GPU—accelerate. GPUs can do vector graphics and alpha blending fast, but subpixel rendering as its done with text is not something that can be done using typical GPU rendering paths. It could still be made to exploit GPUs, but it requires more work and is slower.

- Fonts achieve better crispness on lower DPI displays using hinting VMs. Without them, many glyphs would be quite blurry. Hinting VMs allow typographers making font outlines to make specific decisions about when and how vectors should be adjusted to look good on raster displays. In case it isn’t obvious, the problem here is that doing this for every line on the screen requires you to write special casing for every line on the screen. Maybe you could come up with a general rule that makes everything look good and doesn’t wind up with uneven looking margins or outlines ever (you really can’t, but…) — you have to run this logic for every line. That’s an increase in complexity.

- Glyphs only need to care about their relationships with eachother. UI elements on screen have arbitrary concerns. They have relationships with other things on screen; they line up with other shapes and the whitespace between them is significant. Glyphs only care about other glyphs horizontally adjacent to them (or vertically in some scripts, perhaps) but other UI elements care about their relationship with potentially any neighboring UI elements.

- UI rendering code does not exist in a vacuum. At some point, apps will need to do something that requires them to know the size of something on screen either in physical or logical dimensions. Normally, this isn’t a problem, but if all vector rendering was as complex as text, it would absolutely be an issue. The naive way of handling it would seem correct in many cases, but it would be wrong in many others, just like how old APIs that expose pixels instead of logical units tend to lead to apps with subtle scaling issues.

> Trying to compose rendered vectors using pixel based operations is madness, so... don't?

Yes, of course.

Except that, too, is hard. Think about web browsers: they need to support arbitrarily large layers for composition (like extremely long text in an overflow: scroll div,) and these layers can nest in arbitrarily deep and complex trees. Any node on this tree can apply transformations, masks, filters, drop shadows… In theory, most of this stuff should be doable without ever leaving vector land, but it’s absolutely not without its challenges.

> Or, just don't composite at all. A decade later, Linux desktop compositors are still an ergonomic regression vs. existing display drivers with vsync and double buffering support.

Hrm… I’m not talking about desktop compositing. Even modern desktop compositors render surfaces at pixel positions, so it doesn’t really cause any additional issues. I’m talking about the kind of compositing that GTK or Firefox do.

That said, I do agree that desktop compositing on Linux, especially X11, has been less than ideal. However, it certainly isn’t standing still; the situation with compositing on Wayland and open source GPU drivers has been much more promising. You still get a lot of the trademark issues with compositing that are pretty much inherent, but I have perfect vsync with good frame pacing and a solid 2 frame latency end-to-end in Chromium on SwayWM. I believe that’s close to ideal for a surface running under a compositor. A far cry from the compromise-riddled world of old GPU accelerated compositing.

zozbot234 · on March 25, 2022

The underlying logic for rendering "hinted" line borders and UI widgets is a lot simpler than for hinting arbitrary text. It's a matter of snapping a few key control points to the pixel grid, and making sure that key line widths take up integer numbers of pixels. Much of the complexity you point out only arises because we now insist on having physically sized rendering for "mixed-DPI" graphics, like a single window spanning both a low- and a high-resolution display. That's not necessarily a very sensible goal, and it's not something that would've been insisted on back when achieving "pixel perfect" rendering was in fact a major concern, regardless of display resolution.

A similar concern is the demand for arbitrary subpixel positioning of screen content, that basically only matters in the context of on-screen animations. Nobody really cares if an animation looks blurry, but it's somewhat more important for static content to look right. Trying to have one's cake and eat it too will always be harder than just focusing on what's actually important for good UX.

jchw · on March 25, 2022

> The underlying logic for rendering "hinted" line borders and UI widgets is a lot simpler than for hinting arbitrary text. It's a matter of snapping a few key control points to the pixel grid, and making sure that key line widths take up integer numbers of pixels.

This is exactly what I was “hinting” at when I said coming up with a universal function that would work for anything. You can’t just snap some/all things to a pixel grid; it would look absolutely terrible because it would make lines and whitespace uneven. Even font autohinting, which does exist, is more sophisticated than just aligning key control points to a pixel grid.

> Much of the complexity you point out only arises because we now insist on having physically sized rendering for "mixed-DPI" graphics, like a single window spanning both a low- and a high-resolution display. That's not necessarily a very sensible goal, and it's not something that would've been insisted on back when achieving "pixel perfect" rendering was in fact a major concern, regardless of display resolution.

It’s not. Even under Wayland, which can achieve this, the application would only render one surface at a specific resolution at any given time. Nothing I’ve been talking about is related to being able to split a window across different DPI screens.

> A similar concern is the demand for arbitrary subpixel positioning of screen content, that basically only matters in the context of on-screen animations. Nobody really cares if an animation looks blurry, but it's somewhat more important for static content to look right. Trying to have one's cake and eat it too will always be harder than just focusing on what's actually important for good UX.

If you scale a UI that was designed for 96 DPI pixels to a screen that is around 160 DPI, you already have subpixels. If you then attempt to snap to a pixel grid instead of rendering elements at subpixel positions, then you have uneven, ugly looking UI elements.

This unevenness is arguably more tolerable for text than it is for UI elements, but Microsoft actually took the approach of not having it for text regardless; to make text look cleaner, text uses more aggressive gridfitting in Microsoft UIs, resulting in each glyph being gridfit. This is exactly why old Windows UI scaling lead to cut off text and other text oddities; it’s because the grid fitting lead to text that had different logical widths when rendered at different resolutions!

You can’t just wish away subpixels. Numbers that just happen to be whole numbers are the real edge cases in a world with arbitrary scale factors.

zozbot234 · on March 25, 2022

> it would make lines and whitespace uneven

Are we talking about single-pixel rounding errors, or something else? The former are already practically undetectable at 1080p, and nearly-so at 768p. Given a high standard of "pixel-perfect" rendering, there's basically zero reason to push resolution any higher!

Of course one can even make pure subpixel-based rendering (no fitting-to-pixels at all) look correct, by starting either from pure vectors or from a higher-resolution raster and then using a Lanczos-style filter to preserve perceived sharpness near the resolution limit of the display. This gets us as near as practicable to something that's almost "pixel perfect", without distorting spatial positions to make them precisely fit a pixel grid.

audidude · on March 25, 2022

> some software have started implementing SDFs for font scaling

My "wip/chergert/glyphy" branch of GTK 4 does rendering using https://github.com/behdad/glyphy which uses fields to create encoded arc lists and are uploaded to the GPU in texture atlases. The shaders then use that data to render the glyph at any scale/offset.

Some work is still needed to land this in GTK 4, particularly around path simplification (mostly done) and slight hinting (probably will land in harfbuzz).

nyanpasu64 · on March 28, 2022

Regarding slight hinting... currently GTK4 hints glyphs (distorting glyphs by quantizing vertical positioning) then renders them at fractional vertical positions (resulting in blurry horizontal lines). This is the worst of both worlds, achieving neither the scale-independent rendering of unhinted glyphs with fractional positioning, nor the sharpness of hinted glyphs with integer vertical positions. What is your plan for hinting and positioning?