> For example, if you want a crisp 1px border on 96 dpi, you could specify it to be a 1px border at 96 dpi… but then what happens at 1.5x or 1.75x scale?
The border width should get snapped to the physical (sub-)pixel resolution as part of rendering. Typically, this should come with changes in contrast too, such that if a line is forced to become thinner it also gets drawn with higher contrast wrt. the surroundings, and vice versa. All of this stuff can be made to work.
Also, if you are forced to render a canvas at a resampled resolution because existing APIs give you no other choice, at least do it right using a proper Lanczos-style resampling. This might end up with a quaint "watercolor" effect but guess what, that's a lot better than a blurry, eye-fatiguing mess.
We’ve tolerated a great degree of complexity just to make fonts look good at 96 DPI. Looks like we’re able to tolerate a bit more complexity to enable GPU rendering. However, many years into having high DPI displays, it’s not obvious people are willing to take the complexity to make low DPI and high DPI screens look good simultaneously.
The thing is, with fonts, we already bear the burden of font rendering being complex because that was needed for 96 DPI displays. But, we won’t need much of this magic or complexity when a vast majority of people are using higher DPI displays, because at >200 PPI the difference between a blurry line and a sharp line is basically nil. That is obvious enough on Apple platforms, where many are perfectly happy with the scaling even though it uses 2x as a base for all scales.
I think the future is simply pain. People want cleaner graphics pipelines, and only high DPI displays will get them anywhere.
I’ve come to the same conclusion. Making hi(-ish)-DPI work would be possible with the right APIs. But it’s virtually impossible to also make it work for traditional low-DPI displays at the same time. The departure from pixel-art icons to vector icons alone has already degraded the low-DPI experience substantially. It doesn’t help that developers and designers tend to not use low-DPI displays anymore. But many regular users will, because it continues to be the cheaper option, also in GPU terms for gamers. Full-HD monitors won’t be going away anytime soon. Meanwhile, the mid-DPI space (e.g. 1440p) is in an uncanny valley, often requiring fractional scaling (more than 100%, less than 200%) unless you have excellent eyesight.
The font rasterizer is a massive hack in modern UIs. Subpixel rendering is a serious pain in the ass. When you render text using subpixel rendering, you render the actual vectors at 3x the spatial resolution. But, not simply as if the vectors were 3x wider, because that would look too sharp: it needs to render as if there was 3x as many pixels, which is different.
Then there’s compositing. Normal layers can be composited using alpha blending, assuming some sane format like premultiplied alpha RGBA. But not subpixel rendered text, because alpha blending the components will fuck up the subpixel rendering.
And it goes on, because if you want to handle text like everything else, you need special cases for it to look right. Rotation? Need to render the vectors rotated; can’t rotate in raster. If you need to render to a surface then transform that surface, you’re SOL; it can’t go to rasters until the end.
Normal surfaces can also be rendered at subpixel positions, and of course this does not work for surfaces containing text, because again, it will destroy the subpixel rendering.
OK. So you can get rid of the subpixel rendering and render slightly blurrier glyphs instead. (R.I.P. anyone trying to tell hanzi/kanji apart.) It’s still going to murder legibility if you move it over by a subpixel value because text is already on the edge of readability at 96 DPI.
I haven’t considered gamma correction, hinting, blending different colors, different blending modes, GPU acceleration, etc. because I simply don’t have the brain power to try to reconcile it all. It’s a nightmare.
We already did some of this for text. Which is a herculean effort. We use a freakin virtual machine to power font hinting, and ugly, complex, slow special casing at many layers of already ridiculously complex vector graphics stacks (I mean if you disagree with that assessment, you may just be smarter than I am, but I have serious trouble following the Skia codebase and I doubt Cairo is really that much better.) And speaking of which, there only really seems to be a handful of them out there: there’s Skia, used by most web browsers; Cairo, used by GTK; Direct2D, in Windows; Whatever modern macOS uses that isn’t QuickDraw anymore; and I guess there’s Mozilla’s pathfinder, a promising Rust-based vector graphics engine that was built as part of Servo and seemingly mostly abandoned, much to the world’s detriment. This work is hard. It can be done, but it’s not something I think a single engineer can do, if you want to build one that competes with the big boys even disregarding a few things like performance. I’d love to be wrong, but I have a sinking feeling I’m not.
Even text isn’t done being overcomplicated. As nyanpasu has mentioned above, some software have started implementing SDFs for font scaling. We do this because text legibility is really that important, whereas a line in the UI being slightly blurry for users on older screens is really just not that important. Some languages flat out can’t be read with crappy font rendering, and any of them will give you eyestrain if it’s ugly enough. As much as it sucks, a blurry border on a button doesn’t have an accessibility issue. And rendering at 1x and making the compositor upscale is not a great solution either because again, it’s already hard enough to read text in some languages; the added blurriness of scaling text and ruining subpixels is basically intolerable.
These hacks aren’t free, and with high DPI displays, they’re not needed. There’s a reason Apple did what they did.
OK, but there's clearly an existence proof, and it ran fine on 32 bit machines with slow processors (or even embedded CPUs in the 80's!) way before all the piled hacks you are describing were invented.
As I understand it, all that's needed is a vector renderer, and you keep everything (even text) in vector format as long as possible. RGBA then becomes a special case, as it must be for any DPI independent rendering pipeline.
Trying to compose rendered vectors using pixel based operations is madness, so... don't?
That means you can't have a bitmap-based compositor. So what? GPU's are great at rendering vectors. Composite those instead of bitmaps.
Or, just don't composite at all. A decade later, Linux desktop compositors are still an ergonomic regression vs. existing display drivers with vsync and double buffering support.
> OK, but there's clearly an existence proof, and it ran fine on 32 bit machines with slow processors (or even embedded CPUs in the 80's!) way before all the piled hacks you are describing were invented.
Yes. Driving ~1024x768 framebuffers, on single core processors, with far less demanding workloads, but still, yes. (They still badly needed good glyph caching to accomplish this.) (I’m assuming a Windows XP-tier machine since that was the era most people started using ClearType/subpixel rendering.)
(Single core processors are obviously slower than multicore processors, all else equals, but exploiting multi-core processors effectively is harder and often leads to code that is at least a bit slower in the single-core case…)
> As I understand it, all that's needed is a vector renderer, and you keep everything (even text) in vector format as long as possible. RGBA then becomes a special case, as it must be for any DPI independent rendering pipeline.
I don’t want to sound like I’m being patronizing, but I get the feeling that you may not be grasping the problem.
We can’t just use text rendering logic to power other vector graphics. For many reasons. Text is not just rendered like vectors, as that would simply be too blurry at 96 DPI. Old computers used bitmap fonts or aggressive hinting, and newer computers use anti-aliasing, often with subpixel anti-aliasing. Doing that with every line on screen isn’t feasible even if you wanted to write the code. Here’s an attempt to enumerate just the obvious reasons why:
- It’s slow. Yes, old 32 bit computers could do it, yadda yadda ya. But they did it for text. At the glyph level. And then cached it. They were most certainly not rendering anything near the entire size of the framebuffer at once.
- It’s difficult to GPU—accelerate. GPUs can do vector graphics and alpha blending fast, but subpixel rendering as its done with text is not something that can be done using typical GPU rendering paths. It could still be made to exploit GPUs, but it requires more work and is slower.
- Fonts achieve better crispness on lower DPI displays using hinting VMs. Without them, many glyphs would be quite blurry. Hinting VMs allow typographers making font outlines to make specific decisions about when and how vectors should be adjusted to look good on raster displays. In case it isn’t obvious, the problem here is that doing this for every line on the screen requires you to write special casing for every line on the screen. Maybe you could come up with a general rule that makes everything look good and doesn’t wind up with uneven looking margins or outlines ever (you really can’t, but…) — you have to run this logic for every line. That’s an increase in complexity.
- Glyphs only need to care about their relationships with eachother. UI elements on screen have arbitrary concerns. They have relationships with other things on screen; they line up with other shapes and the whitespace between them is significant. Glyphs only care about other glyphs horizontally adjacent to them (or vertically in some scripts, perhaps) but other UI elements care about their relationship with potentially any neighboring UI elements.
- UI rendering code does not exist in a vacuum. At some point, apps will need to do something that requires them to know the size of something on screen either in physical or logical dimensions. Normally, this isn’t a problem, but if all vector rendering was as complex as text, it would absolutely be an issue. The naive way of handling it would seem correct in many cases, but it would be wrong in many others, just like how old APIs that expose pixels instead of logical units tend to lead to apps with subtle scaling issues.
> Trying to compose rendered vectors using pixel based operations is madness, so... don't?
Yes, of course.
Except that, too, is hard. Think about web browsers: they need to support arbitrarily large layers for composition (like extremely long text in an overflow: scroll div,) and these layers can nest in arbitrarily deep and complex trees. Any node on this tree can apply transformations, masks, filters, drop shadows… In theory, most of this stuff should be doable without ever leaving vector land, but it’s absolutely not without its challenges.
> Or, just don't composite at all. A decade later, Linux desktop compositors are still an ergonomic regression vs. existing display drivers with vsync and double buffering support.
Hrm… I’m not talking about desktop compositing. Even modern desktop compositors render surfaces at pixel positions, so it doesn’t really cause any additional issues. I’m talking about the kind of compositing that GTK or Firefox do.
That said, I do agree that desktop compositing on Linux, especially X11, has been less than ideal. However, it certainly isn’t standing still; the situation with compositing on Wayland and open source GPU drivers has been much more promising. You still get a lot of the trademark issues with compositing that are pretty much inherent, but I have perfect vsync with good frame pacing and a solid 2 frame latency end-to-end in Chromium on SwayWM. I believe that’s close to ideal for a surface running under a compositor. A far cry from the compromise-riddled world of old GPU accelerated compositing.
The underlying logic for rendering "hinted" line borders and UI widgets is a lot simpler than for hinting arbitrary text. It's a matter of snapping a few key control points to the pixel grid, and making sure that key line widths take up integer numbers of pixels. Much of the complexity you point out only arises because we now insist on having physically sized rendering for "mixed-DPI" graphics, like a single window spanning both a low- and a high-resolution display. That's not necessarily a very sensible goal, and it's not something that would've been insisted on back when achieving "pixel perfect" rendering was in fact a major concern, regardless of display resolution.
A similar concern is the demand for arbitrary subpixel positioning of screen content, that basically only matters in the context of on-screen animations. Nobody really cares if an animation looks blurry, but it's somewhat more important for static content to look right. Trying to have one's cake and eat it too will always be harder than just focusing on what's actually important for good UX.
> The underlying logic for rendering "hinted" line borders and UI widgets is a lot simpler than for hinting arbitrary text. It's a matter of snapping a few key control points to the pixel grid, and making sure that key line widths take up integer numbers of pixels.
This is exactly what I was “hinting” at when I said coming up with a universal function that would work for anything. You can’t just snap some/all things to a pixel grid; it would look absolutely terrible because it would make lines and whitespace uneven. Even font autohinting, which does exist, is more sophisticated than just aligning key control points to a pixel grid.
> Much of the complexity you point out only arises because we now insist on having physically sized rendering for "mixed-DPI" graphics, like a single window spanning both a low- and a high-resolution display. That's not necessarily a very sensible goal, and it's not something that would've been insisted on back when achieving "pixel perfect" rendering was in fact a major concern, regardless of display resolution.
It’s not. Even under Wayland, which can achieve this, the application would only render one surface at a specific resolution at any given time. Nothing I’ve been talking about is related to being able to split a window across different DPI screens.
> A similar concern is the demand for arbitrary subpixel positioning of screen content, that basically only matters in the context of on-screen animations. Nobody really cares if an animation looks blurry, but it's somewhat more important for static content to look right. Trying to have one's cake and eat it too will always be harder than just focusing on what's actually important for good UX.
If you scale a UI that was designed for 96 DPI pixels to a screen that is around 160 DPI, you already have subpixels. If you then attempt to snap to a pixel grid instead of rendering elements at subpixel positions, then you have uneven, ugly looking UI elements.
This unevenness is arguably more tolerable for text than it is for UI elements, but Microsoft actually took the approach of not having it for text regardless; to make text look cleaner, text uses more aggressive gridfitting in Microsoft UIs, resulting in each glyph being gridfit. This is exactly why old Windows UI scaling lead to cut off text and other text oddities; it’s because the grid fitting lead to text that had different logical widths when rendered at different resolutions!
You can’t just wish away subpixels. Numbers that just happen to be whole numbers are the real edge cases in a world with arbitrary scale factors.
Are we talking about single-pixel rounding errors, or something else? The former are already practically undetectable at 1080p, and nearly-so at 768p. Given a high standard of "pixel-perfect" rendering, there's basically zero reason to push resolution any higher!
Of course one can even make pure subpixel-based rendering (no fitting-to-pixels at all) look correct, by starting either from pure vectors or from a higher-resolution raster and then using a Lanczos-style filter to preserve perceived sharpness near the resolution limit of the display. This gets us as near as practicable to something that's almost "pixel perfect", without distorting spatial positions to make them precisely fit a pixel grid.
> some software have started implementing SDFs for font scaling
My "wip/chergert/glyphy" branch of GTK 4 does rendering using https://github.com/behdad/glyphy which uses fields to create encoded arc lists and are uploaded to the GPU in texture atlases. The shaders then use that data to render the glyph at any scale/offset.
Some work is still needed to land this in GTK 4, particularly around path simplification (mostly done) and slight hinting (probably will land in harfbuzz).
Regarding slight hinting... currently GTK4 hints glyphs (distorting glyphs by quantizing vertical positioning) then renders them at fractional vertical positions (resulting in blurry horizontal lines). This is the worst of both worlds, achieving neither the scale-independent rendering of unhinted glyphs with fractional positioning, nor the sharpness of hinted glyphs with integer vertical positions. What is your plan for hinting and positioning?
The border width should get snapped to the physical (sub-)pixel resolution as part of rendering. Typically, this should come with changes in contrast too, such that if a line is forced to become thinner it also gets drawn with higher contrast wrt. the surroundings, and vice versa. All of this stuff can be made to work.
Also, if you are forced to render a canvas at a resampled resolution because existing APIs give you no other choice, at least do it right using a proper Lanczos-style resampling. This might end up with a quaint "watercolor" effect but guess what, that's a lot better than a blurry, eye-fatiguing mess.