Hacker News new | past | comments | ask | show | jobs | submit login
Digital Audio Workstation Front End Development Struggles (billydm.github.io)
347 points by ingve on May 16, 2023 | hide | past | favorite | 265 comments



> Both Javascript and the DOM are slow, there's no changing that.

The DOM isn't slow. Adding or removing a DOM node is literally a few point swaps, which isn't much more than setting a property on an OOP language object. What is slow is "layout". When Javascript changes something and hands control back to the browser it invokes its CSS recalc, layout, repaint and finally recomposition algorithms to redraw the screen. The layout algorithm is quite complex and it's synchronous, which makes it stupidly easy to make thing slow. Frameworks like React with virtual DOMs help you by redrawing on every state change, which will typically both speed things up significantly and also carry you through a lot of things that would break if you were using non virtual DOM frameworks like JQuery. You can get smooth and quick animations with JavaScript and the DOM, however, but only if you build things to limit what your application can do in a frame to operations which can be performed on the GPU.

I can understand if that's not exactly an option for the author of this article, but I'm not convinced you couldn't get Electron (or similar) to work as a GUI framework for a DAW. I suspect Electron specifically might actually be a pretty good framework for DAW's, considering what Microsoft is doing with VSC. I also think that the downside of frameworks like Electron is that you need to be, not only Microsoft, but the Microsoft VSC team (and not the Microsoft Teams team) specifically, to bend them to your way.

That being said, I think this will likely be one of those things where you'd need to create a GUI framework (or bend one to your will) to really get DAWs to work for you. Because I think the author is spot on about the "state" of GUI frameworks in general.


> Frameworks like React with virtual DOMs help you by redrawing on every state change, which will typically both speed things up significantly and also carry you through a lot of things that would break if you were using non virtual DOM frameworks

A virtual DOM based framework will not be faster than manual DOM manipulation with either vanilla JS or even JQuery. It certainly makes development easier, and massively reduces the risk of bugs. But there is no magic to a VDOM and speed, technically you are usually doing more work, building a VDOM and diffing it.

Edit:

Rich Harris of Svelte has a good post explaining:

> Virtual DOM is pure overhead. Let's retire the 'virtual DOM is fast' myth once and for all

https://svelte.dev/blog/virtual-dom-is-pure-overhead


> But there is no magic to a VDOM and speed

There absolutely is. If you are using standard DOM APIs and are first changing the text of a <span> element, then changing it back to what it was before (which can easily happen if your application logic is sufficiently complex, with multiple piecewise state changes impacting the UI), the browser will have to re-render the element twice, which means two full re-layouts unless the element is absolutely positioned or something.

By contrast, VDOM-based frameworks will detect that the DOM hasn't actually changed, and as a result, the browser doesn't have to do anything (other than update the VDOM).

That's indeed "magic", and makes a tremendous performance difference with complex applications in practice.


No, in a correctly implemented pure DOM manipulation app you would update the DOM once to match your state, that's faster than building a VDOM and diffing it. Modifying the DOM twice, with an unintentional reflow and redraw between, would be a bug in your code.

Yes, "if your application logic is sufficiently complex, with multiple piecewise state changes impacting the UI" is what a VDOM helps with from a developer perspective. But it isn't magically "faster".

Personally I'm a fan of reactive frameworks such as Vue that do use a VDOM, but also by using a reactive state can do fine grain partial rendering of that VDOM for efficiency.

Would I build an app without a (probably VDOM based) framework, no I wouldn't.

Also note that the "fastest" frameworks, such as Solid, don't use a VDOM, they track state changes and do fine grade DOM manipulation to match it.


Who needs memory safety? Just correctly use malloc and free on every possible code path. Easy and simple!


VDOM isn't the equivalent of memory safety - such as using Rust over C, its much more similar to using a garbage collected high level language - such as Python or Ruby - over C.

Your implied comparison doesn't work.


VDOM has nothing to do with garbage collection. VDOM is a cache.


I do not use any frameworks for browser based JS/HTML front ends. Just plain JS and some domain specific libs when needed. No problems so far and the resulting GUIs are fast.


So different from Netscape / internet explorer 3 days.

Especially anything to do with corporate that insisted on ancient versions of Internet explorer.

Can you write the app when you almost have to rewrite it and get it to work with Internet explorer


> If you are using standard DOM APIs and are first changing the text of a <span> element, then changing it back to what it was before ... the browser will have to re-render the element twice

As far as I understand, not if this all happens within a single browser layout/paint cycle [0]

- [0] https://twitter.com/jaffathecake/status/1552242563654066176


You wrote this later and I apologise for bringing it up here:

> a correctly implemented pure DOM manipulation app you would update the DOM once to match your state, that's faster than building a VDOM and diffing it

Bur that is sort of my point. Maybe I could have been clearer about it, but my intend was to say that the VDOM frameworks are faster (or perhaps safer is the right word) for people who aren’t doing things correctly. Which among others is also me.


No worries, and the rest of your comment (DOM actually fast, render+reflow slow, Electron good choice) is good. It just bugs me when people repeat the myth that VDOM is faster at rendering than native DOM (or not clearly explaining otherwise), it encourages people to reach for a tool they don't necessarily need.

As I have said in other comments, VDOM based frameworks are good, people should use them, but not because they are "magically" faster.


Surplus, which predates svelte and does much the same thing, also uses no vdom and has topped out performance benchmarks for a long time.


Huh? You're not doing any of this, the framework does it automatically. The virtual DOM and DOM change reconciliation makes it faster. DOM changes are really slow. The reconciliation makes sure only the parts are re-rendered which have changed. Failing that, the full screen is re-rendered. If you're doing manual VDOM building you're doing something very wrong.


> If you're doing manual VDOM building you're doing something very wrong.

Rendering a React or Vue component is building a VDOM, that's what I'm referring to. Your code is explicitly building a VDOM.

> The reconciliation makes sure only the parts are re-rendered which have changed.

Yes, and that makes development easier.

That reconciliation is only "faster" if your alternative non-VDOM code is poorly written, throwing away chunks of DOM and rebuilding from scratch.

I'm not advocating for not using VDOM frameworks, I use them, I'm advocating for understanding they are not faster than the DOM. that is a myth, and results in people misunderstanding the tools or how the DOM and browsers work.


Yeah I definitely agree with VDOM doesn't necessarily speed up anything on its own. There are many frameworks that benchmark better than react, for example - svelte, lit - these don't have VDOM at all.


And yet, InfernoJS is faster than Svelte while using a VDOM.


I quickly glanced at the documentation and I'm greatly disappointed that concept of "circles" is not there. Devs, come on!


lol took me a bit a to realize what you were talking about


> What is slow is "layout".

Right. My understanding is that, as long as you're limiting lots of graphical updates to fixed-size areas, the DOM isn't really a problem.

When you resize the window it will be slow. But if you're just animating a single knob being dragged, it's totally smooth. Similarly, if you're blitting a bit array to a canvas, it's plenty fast.

The stuff someone needs to do with a DAW interface strikes me as quite perfectly suited to the DOM actually, precisely because the layout is extremely modular. You're not dealing with long strings of text that keep unpredictably wrapping and pushing elements and therefore recalculating the entire layout. It's pretty ideal, really.


> I suspect Electron specifically might actually be a pretty good framework for DAW's, considering what Microsoft is doing with VSC

VSC always feels noticeably slower to me than GTK / Qt / Win32 apps


I spent some time last year building out a UI framework for VSTs based on embedded HTML/JS, after being frustrated with the terrible VST3 GUI package from Steinberg. I think it showed promise, but I'm unable to continue to make progress with it:

https://github.com/rdaum/vstwebview

It's not Electron, but it's Electron-like. And it was totally fine, performance wise. Remember this is the UI layer, not the audio rendering layer. On a multicore system, these are two different threads of execution that should not be impacting each other, if you do it right.

It'd be great if somebody were to run with it. I feel I made good initial progress. These days, I have to concentrate on my paying work (in Rust, not C++, and not in the audio domain) instead.

Rant: HTML/JS UIs are not intrinsically slow. V8 is highly tuned. Rendering in Chromium is highly tuned. I worked on a team @ Google (Nest "Home Hub") that got Chromium running with decent performance on very low-spec ARM SoC devices with very acceptable rendering performance. So many hundreds of thousands of developer hours have been spent on improving browser rendering performance that it's preposterous and dangerously dogmatic to insist that some homegrown solution will be intrinsically superior, performance wise and still be robust and feature complete and offer... luxuries... like accessibility or keyboard shortcuts, etc.

Do I personally like writing web UIs? No. Do I want to write my own UI toolkit from scratch? Oh, that'd be neat, sounds fun. Do I trust anybody, including myself, to do that right? No. In particular I don't trust e.g. Steinberg to do this.


so an incalculable effort was spent making chromium just tolerable, but rest assured, still far far far from any kind of random native application, in both speed and memory usage.

simply opening a chromium tab with an empty html page uses more ram than my amiga had 30 years ago, and funnily enough, I was able to run applications on that

maybe i will be impressed if they manage to get the chromium abomination down to a more reasonable size, compared to its 1.7GB compressed size the source is now (how is it even possible?????).. I wonder, how big was khtml when webkit forked? how big was webkit when google forked?

this shit is totally out of control, and pretending anything else is something that is quite a feat to do with a straight face, and for that, you have my admiration


I too from from that era of computing. That ship has sailed. There is no comparison in terms of the architectural complexity between then and now. You can decry all the layers of abstraction, and they have indeed brought amplification on the amount of memory and cycles necessary to do "simple" tasks. But they've also brought with them capabilities so far beyond what we did then.

I worked in the Chromium source tree. It is indeed massive and complicated. But I can assure you that none of us were ever "wooo let's just chew memory and CPU cycles" -- it is written with efficiency in mind all over. It's just cumulatively quite complicated what it is doing.

Can you write a GUI toolkit which throws away 99% of that cruft? Yes. But it will lack so many creature comforts that we've become used to. What we had back then.. it lacked vector fonts, it lacked GPU acceleration, it lacked transparencies, it lacked multiple DPI support, aliasing, subpixel rendering, etc.. It lacked accessibility. It lacked multiple keyboard layout support. It lacked proper multiple language support, it all predated unicode and nobody even thought about right to left text flow, etc.. Obviously no touch screen and gesture support. Hell, the Amiga widget kits barely had standardized components of much variety at all.

And that's not even bringing up the VM that runs JavaScript. Which you can bitch all you want about it -- but back then many of us salivated over Smalltalk-80 (another garbage collected dynamic late-bound OO language) but our machines could not handle it. Turns out to do that kind of system well and at scale, you need resources far beyond 80s and 90s computers. We're there now, but it took a long time.

All of this costs in software complexity.

Finally, memory usage is also a lot more complicated than it used to be. Firstly, we have so much more of it than before so we can do a lot more trading off memory for speed than we did back then, when memory was stupidly expensive. Secondly, buffer management and allocation can have all sorts of subtleties esp in the context of virtual memory. Something can show in top as using X memory, but the actual mount of memory pages in physical use and currently paged into the application may not be anywhere close to that. In a virtual memory system, memory accounting is a tricky subject.

Shit is not out of control. It's just a lot more complicated on the surface than you're willing to admit.


the khtml to webkit to chromium expansion is NOT in any way defensible in contrast to amount of features. I sincerely hope that it is done due to not caring even the tiniest amount about efficiency, because the alternative is not pleasant to think about.

funnily enough some software cares greatly about efficiency, namely those where it isnt just possible to buy a bigger cpu and more ram to make it "tolerable", they manage to care about practically every single instruction, and what do we get for it? great quality of software


The DOM is fast for the kind of applications that most people create with React: sign-up forms, shopping carts, blogs, and other web sites that people used to be able to create before React using HTML directly but that web developers now look at like Stonehenge or the Egyptian Pyramids.


I don't know about the performance but web is THE most capable platform that itself is cross platform.

As for performance, text editor with dozens of plugins and syntax highlight is no joke either and I have no issues on my MacBook Air with only two cores from few years ago - abandoned Jet Brains IDEs for a web based text editor (VScode) and works pretty well even on large code bases.


I think you have failed to comprehend what a DAW actually needs to do/render, quickly.


Can you share some examples where a DAW/plugin would need to update that much visual data onscreen with almost zero latency? I'm genuinely asking someone who used to make music with orchestral/Pianos/Drum VSTs a ton a few years ago.

I agree that the underlying audio should be as fast as possible, but I don't see why the UI couldn't "lag" behind by a few extra milliseconds for on screen rendering. Especially as a plugin.

Again, not trolling; genuinely curious!


Live manipulatable waveforms. Think DJ decks, etc. It tends to be one of those "tearing is better than stuttering" situations so it goes against a lot of compositing assumptions.


Imagine your keyboard lagging a few extra ms before the character you typed is visible on the screen.


Average keyboard latency is between 3-15ms depending upon the type of keyboard switches used and the connection type.


You're not wrong to assume so. I think I should have said that minus the DAW and otherwise generally speaking for let's say 95% of the use cases, the web is pretty OK.

So instead of editing the comment above, I'd agree with your underlying point about DAW and complexity of the UI. It is no trivial.

As a backend engineer, the progress on web platform certainly is dazzling from Web Assembly to Web GPU, Flex, Grid, Container queries and lately view transitions.


> I also think that the downside of frameworks like Electron is that you need to be, not only Microsoft, but the Microsoft VSC team (and not the Microsoft Teams team) specifically, to bend them to your way.

I am not sure this is true, Discord as one example is pretty good. As is Slack's desktop app*

*By "good" I mean, perform the task in a stable way and appear fairly 'native' in terms of UX (but not UI design of course)


> I am not sure this is true, Discord as one example is pretty good. As is Slack's desktop app*

I think you actually need to try out an actual good native app just to see how much of a ridiculous claim that is.


We did some pen & paper gaming over the pandemic in Discord and its most notable feature was draining any battery you threw at it in like three hours flat—best case. All I can figure is it was sitting there calculating digits of Pi in a hot loop, or mining Bitcoin or something.


Discord and Slack are both nicely optimized web apps but they're not very fast compared to native apps.

You can tell that there's a noticeable lag between when you type into discord versus the terminal.


I didn't realise the bar latency had to get over was a comparison to a terminal!

I best throw every ide or tool I've ever used in the bin!


Do you have any idea how fast modern CPUs are?

I have used more complex native applications on Windows 95 that had less latency than Discord does, on a 10900k 20-thread CPU.

Modern terminals are quite slow compared to how fast the CPU itself it, because of the myriad of layers it sits upon.

And Discord is extremely slow it's not even funny. Having one single frame of additional lag between click and action, with modern PCs probably means billions of instructions being executed. For a glorified IRC client, to render a screen?

The problem is we all have become complacent and lazy, so we are used to software which is utterly inefficient and slow. You should throw all your IDEs in the bin, if only there was one that felt as snappy as Turbo C++ did 30 years ago.


FYI, most terminals are more on the slower side. For example intelliJ has a much lower input latency than most terminals. So it's not really an unreasonable expectation, many apps are just really damn slow.


We can start with something basic like a native chat app - e.g. Telegram is a nice example of an app that doesn't lag like ass when chatting.

Or an IRC client - see for example Textual.


Despite some recent issues I've had with the telegram client on a certain rolling release distro, it is still by far the gold standard of cross platform native apps in my mind.

IIRC its all/mostly CPP. Runs natively, can handle all sorts of weird desktop configs easily. Its just very snappy and a pleasure to use even on fairly low-end systems. I've installed it on a 32-bit laptop from 2004 and it was very usable.

People make some statements about it being Russian owned, and at the end of the day I don't talk about anything important on there, if a Russian team wants to make a world-class cross platform native chat app that I can direct friends/family to and know they won't have any issue getting started, I'm happy to support it.

I hate the web-appification of every desktop app with a burning passion.


The slack desktop app often lags when I type...


It could be argued that in this sense, the Discord frontend team “is” a VS Code team (and not a MS Teams team)


I worked in that field for 8 years for a somewhat known plugin editor. We used JUCE a lot at the beginning, but ended up slowly moving to an OpenGL-backed renderer for performance-critical parts (eg: VU meters and other audio-to-UI feedback), and JUCE for the rest of UI scaffolding (layout, controls, menus etc).

Same thing went for the other parts of a plugin architecture: we wrote our own VST/AU/AAX interfaces with the DAW to overcome some of the issues we encountered with JUCE.

After I left the company, my former teammates rewrote the whole UI framework to a pure OpenGL implementation. I'm not sure if there's even a drop of JUCE in the codebase anymore.

Note: this was before the PACE acquisition (though we had good contacts with the PACE team, their founders being brilliant engineers).

So my advice to people starting in this field: use whatever is available now, JUCE is a treasure trove for beginners, then slowly optimise where needed.


Slint dev here.

I'm a bit disappointed to see that Slint is considered a non-starter. In fact, one of our users is currently working on exactly that (a VST plugin for an audio application) and they are about to release it very soon. Unfortunately, I can't share a link until it's officially released.

The author seems to dismiss declarative UI without explaining why. From my perspective, declarative is the best way to describe UI.

Regarding the limited support for custom widgets, I disagree. In Slint, it's incredibly easy to create custom widgets. A widget is simply a component that can be created and utilized.

While it's true that our desktop integration is still a work in progress, we have already implemented the core functionalities needed for building such UIs. We are actively developing features based on user needs and prioritize implementing features that our users require immediately. If you're interested in using Slint, please reach out to us, and we'll ensure that we implement what is necessary for your product.


> The author seems to dismiss declarative UI without explaining why. From my perspective, declarative is the best way to describe UI.

…but I mean, that’s an opinion that is not falsifiable.

If the author disagrees, that’s up to them right?

I agree with the author too; after using xaml, my taste for declarative UIs is limited.

Here’s something that is falsifiable: what is possible using declarative UIs is a strict subset of what is possible using imperative code.

If your declarative config is Turing complete, it’s a DSL; ie. a programming language, not declarative and you’re in “not invented here” land.

If not, the possibilities for it are a subset.

That’s provable; so, if you’re worried that your UI is too complex to easily represent with the existing declarative structure, go for code.

What if you need dynamic layouts? What if your layouts are determined at runtime by dynamically loaded plug-ins? What if your layout needs to generated differently based on themes, or screen sizes? Who knows?

It seems entirely reasonable.

The same goes for other declarative solutions and projects. Eg. Terraform.

I mean, the author literally says a) it’s a trade off for velocity, b) they’re more concerned with performance, c) the implementation in slint is not complete and d) they’re considering writing a very very low level UI library to get what they want.

…maybe, they just want different things to what you’re offering?

It doesn’t mean they’re “wrong”.


I've worked on complex UI's for creatives based on XAML.

An application like Photoshop is comprised of relatively simple controls (property sheets, etc.) that are combined in complex ways. (Users can add UI elements such as a color picker, dock them, have them floating in space.) Graphical UI builders are fine for the controls, but there has to be some framework for managing the particular set that is on the screen.


I'm not sure where you read "wrong". Did the message get edited?


No, sorry if my "'s made it appear I was quoting a literal text. What I mean is:

> The author seems to dismiss declarative UI without explaining why.

> From my perspective, declarative is the best way to describe UI.

I think it's pretty clear that the writer here believes that the OP is wrong about slint and declarative UIs.

My point is that OP is not wrong; declarative UI is not the best way of doing UI for all use cases.

It is simply one way of doing UI.


Those are scare quotes around the word. It's not quoting the parent comment.


Declarative UI, even when implemented coherently within the same langauge like in Flutter and MAUI/Xamarin, so there is no need for an ugly bridge between two worlds, still leads to an unreadable nested hell.

Surprisingly, an imperative GUI creation code is much more easier to read and modify.


I respectfully disagree. In my experience, Declarative UI is more concise, making development and changes easier. It enhances productivity and allows for a clear separation of concerns, which is beneficial.

Moreover, Declarative UI enables the development of powerful tooling and visual editors. For instance, in Slint, we have an extension that provides live UI preview and code transformation capabilities. We are also actively working on a visual editor that lets users drag and drop widgets. Such capabilities are difficult or impossible to achieve with imperative APIs.


> We are also actively working on a visual editor that lets users drag and drop widgets. Such capabilities are difficult or impossible to achieve with imperative APIs

I don't think you have ever used an imperative API, have you?

I used VB, then Delphi, then C++ builder and now use Lazarus, and there's no declarative equivalent that is easier or faster than those.


In the examples you mentioned, the UI is typically described in XML or form files (.dfm), which is essentially a declarative approach, and clearly not described in imperative code. Although they may not be intended to be edited by hand, the underlying representation is still declarative in nature.


Agreed, but they were easily achieved with imperative APIs.

This is the statement I was contending:

> We are also actively working on a visual editor that lets users drag and drop widgets. Such capabilities are difficult or impossible to achieve with imperative APIs

Maybe I should have said "there's no declarative-only equivalents that's easier or faster".

As someone else pointed out elsethread, declarative is a subset of imperative, so it's easy to produce an imperative API that consumes machine-generated declarations, but not so easy to use a declarative-only framework to reproduce the ease and speed of Delphi, Lazarus, etc.


Have you looked at tools such as Delphi, Lazarus etc and the approach used to build GUI components which can be added at build time or at runtime?

If you are interested in talking about this, do let me know.


Hmm? I've written a fair bit of React, which I assume fits the declarative description, and I never got the impression that it leads to an "unreadable nested hell". You split up your big functions into smaller reusable functions, just like in all other forms of programming.


By the traditions of desktop ui, React is imperative (or maybe immediate): it's written in a declarative style, but ultimately jsx is just JavaScript.

The best analogue to "declarative" in desktop ui terms would be to have your dom templates in separate files that you don't manipulate the structure of in code.

I honestly have an overall negative view of react, but its method for letting you declare ui structure is really fantastic.


Absolutely. I actually prefer a combination of declarative + imperative code similar to how it is in Delphi/Lazarus.

The overall layout of the GUI is designed using a GUI designer and this gets saved as a "form" in a resource file which is loaded at run time and so the actual code does not have a lot of declaration of the GUI but it is read in as a stream from a separate resource file.

However, to this GUI, you can add event handlers, add or remove components at run time etc. All of this is quite clumsily done (IMHO) in frameworks such as flutter.


Most plugin UIs are extremely simple compared to a DAW, and have the nice property that most of the time they aren't visible.


This is absolutely not the case for many DAW plugins


Which part isn't the case?


Plugins in DAWs are graphically complex, animate, and exhibit plenty of the issues described about DAWs themselves


Yes, but a DAW must do that dozens/hundreds of times more and is on screen for the entire lifetime of the program.

I stand by what I said: most plugins have nowhere near the complexity of a DAW. Some do! (Kontakt/Reakt, and so on) but the vast majority only have a small subset of what the DAW needs to display, and only need to do it for a limited number of channels, and only need to do it rarely.


I find Slint's font rendering odd and somewhat off-putting. It doesn't look sharp, especially on classic 96 DPI displays. Big text look inconsistently both fuzzy and aliased, and small text has inconsistent intensity. Lack of subpixel rendering and hinting? Rounded corners look sharp in comparison.


It seems you may have used the FemtoVG backend, which is the default if Qt is not installed and is written in pure Rust. We offer multiple rendering backends, and you may have better results with the Qt or Skia backend, which utilize native font rendering. Additionally, a recent change made improvements to address this issue in our FemtoVG backend: https://github.com/slint-ui/slint/pull/2618


I was primarily looking at demos on the homepage.


That's unfortunately an issue I have seen with several new GUI frameworks over the last years: with High-DPI displays getting more widespread, they tend to push subpixel rendering support down the road.


And understandably so. Du pixel rendering only works on some kinds of displays (many modern displays don't have 3 vertical subpixels per pixel), makes the text look kinda bad due to the wonky colors, really doesn't mesh well with any sort of transparency or even coloured text, it requires that you keep 3 images of each glyph in memory to account for the 3 different sub pixel offsets. It also deeply entangles detailed knowledge of the monitor into your font rendering. You need to re-render everything and update all text any time orientation changes or the window is dragged between monitors. And how the hell do you handle a window stretching across 2 different monitors?

That's not to dismiss subpixel rendering, there is arguably a legibility improvement to trading off color accuracy for horizontal resolution, but it's really no wonder that new frameworks don't bother and that old frameworks are losing the ability to do subpixel rendering. Apple, for example, ripped out subpixel rendering from their frameworks a long time ago (before Retina displays, IIRC).


> Apple, for example, ripped out subpixel rendering from their frameworks a long time ago (before Retina displays, IIRC).

...or at least when Retina displays were still rare and expensive, probably to encourage their adoption.


Probably because just using higher resolution is easier if it's available. Taking advantage of the physical structure of a pixel seems weird to me.


Actually, CRT monitors did "subpixel rendering" automatically when the beams hit the subpixels only partially. So using this technique on LCDs is just trying to emulate a feature that CRTs have built in.


While it's not super clear, I think TFA's issue with Slint is the markup file (new syntax, not rust, special tooling/dev environment?), not the declarative bit. After all, they didn't say anything about Iced which is also declarative.


Good point. I don't see the issue with a new syntax either. GUI libraries always come with their own set of large APIs and conventions, so learning a new syntax isn't a big deal. The special tooling and dev environment in Slint are optional. But if you do use them, the Language Server Protocol (LSP) allow to integrate them everywhere and provides handy features like code completion, navigation, and even a live preview.


> From my perspective, declarative is the best way to describe UI

I agree.

It seems like we have some good solutions for declaratively describing the structure and static appearance of UIs - but beyond things like transition animations, we don't have anything that satisfactorily describes their behaviour. Its a hard problem nut I sometimes wonder why this aspect is under-served by the technologies we have.


As a previous Qt user, I never got on the qml train and stayed firmly in widgets land. Declarative uis work well, until they don't because of custom requirements, and then they simply get in the way. They are harder to debug and understand from a code perspective. Not everyone writes apps that target both desktop and mobile.


Do you have plan for wgpu?


We're working on interop, so that when you select Slint's Skia renderer on top of Vulkan or Metal, you can import a wgpu texture if wgpu's Metal or Vulkan adapter was chosen.

What kind of feature set do you need?


I've worked on big complex UI projects that needed to run on embedded hardware, using different tech. Some of my observations over the years, if it helps:

* The DOM isn't "slow" per se, but using the DOM for this kind of application is usually the wrong path to take. As soon as I need something even slightly off the well beaten path of web applications I tend to reach for canvas based solutions.

* Libraries like EFL are quite nice to work with but also quite low level. By default that means C, but there are bindings for other languages.

* Some of the things you describe (like rendering spectrographs) I'm not convinced are as CPU intensive as you think they are? Have you benchmarked? Winamp was doing these kind of visualizations back in the 90's. I kind of doubt if you really need to use shaders to render a typical DAW if you're smart with which parts need immediate mode rendering and which can use retained mode. Using render targets etc seems like a smart way to do a DAW, as most parts are actually quite static and only move around the screen.

* I have a feeling - just a feeling - that a lot of people choose Rust for projects because it's one of the newer shinier languages, but as you've discovered, maybe that's not necessarily smart. I would choose Rust for its concurrency paradigm - which might be useful for a DAW! - but for building a UI I would actually personally choose something JavaScript-based. The productivity is just so much better than most other languages, for building UI's. This is not just my experience - I worked for some time at a large public listed US company who used JavaScript for all their big UI projects on embedded hardware, what they changed over time was the underlying rendering engine, but they kept JS because 1) hiring devs 2) productivity.

Good luck, it's a super ambitious project, but I'm sure it's amazing fun to work on! :)


Regarding drawing analyzers, spectrographs:

I feel it actually has become a lot more expensive to draw individual pixels to a screen with the cpu compared to 20yrs ago.

Perhaps just because the resolutions are higher, but maybe the architecture (hardware or os) have just changed.


With CPU, totally.

As an example I can look at my own Sonic Visualiser application, largely written 15-18 years ago and entirely CPU-driven. Relative to then, it's now horrible on contemporary Macs for example - it feels far slower than it did a decade ago. It just isn't what the hardware expects.

(There may be an element of toolkit-platform impedance and simple poor design on my part - it uses Qt and feels quicker on other platforms - and I don't want to argue the details here, but I think the basic principle that you really want to avoid CPU in the frame update is sound. Preparing things on a non-time-critical path via CPU should be another matter however, there's quite a lot of capacity there.)


The “fast path” for macOS exists. I don’t know what is happening in Qt-land, but if you want to throw pixels at the screen really fast, you can do it through Core Animation. You can feed it a buffer of pixel data.

My experience is that this is extremely fast.


Good to know about. This was also my intuition, that there must still exist ways to blast pixel data quickly, but it's not the "happy path" in modern graphics API's.


Blasting pixel data from the CPU to the GPU is normally about as "happy path" as it gets. The entire architecture is designed to make that kind of operation super efficient and super fast.

It's the opposite direction (GPU to CPU) which is a pain in the ass. Still fast if you do it correctly, but it's easy to end up with a stall.


Unfortunately modern macOS renders everything at 5k and downsample, the rendering could do pretty significant stuff (like, extend bitdepth to 16-bit, color lookup to AppleRGB, trim bitdepth, with threads!) behind your back when using non-Metal, and it is buggier than before too. This stuff is way more expensive that CPU rendering itself.


It isn’t really though. If this were true rendering text would be super slow - but it isn’t. Text shaping and rendering is done on the CPU on all the major platforms and is simply highly optimized.

And people throw 15 to 20 years ago like it was a long time ago - but OSX Jaguar is more than 20 years old now (that’s the version that brought the current design of 2D desktop rendering with a compositing window manager to Mac OS X)


> It isn’t really though. If this were true rendering text would be super slow - but it isn’t.

I wouldn't say so, I regularly profile my system or apps and text rendering is definitely one of the things that come up the most


> text rendering is definitely one of the things that come up the most

Ok this doesn’t dispute what I’m saying at all and I would think you would know better. Do you have profiles from 15 or so years ago that show text rendering was a substantially smaller percentage of time.

And if you do - ensure it is comparing apples to apples with subpixel (or at least greyscale) hinting and full featured shaping. Neither of these are related to blitting/blending performance but they are relatively expensive.

Modern text rendering is super expensive (but by modern we’re talking on the order of 15 to 20 years, not yesterday). People are making claims that CPU 2D rendering has gotten more sluggish in that time frame - though nothing fundamental has changed other than the prevalence of “Retina” in some circles - but that isn’t something that has scaled past CPU performance. This all works because historically these systems have been painstaking about only drawing when needed what’s needed.

If anything has changed it’s that - software in general for a number of reasons has gotten much sloppier about this.

And in the case of Mac OS specifically - they continue to throw all kinds of shit in the CPU rendering pipeline. If you just do things yourself and blit to a Metal texture, nothing is fundamentally slower.

I contend that most of what people are claiming are some fundamental hardware changes (that haven’t happened - things are faster across the board but otherwise it’s the same basic stuff) is due to less effective application software methods. For 2D desktop apps the GPU is not the factor some people seem to think it is.


Interesting. To be fair it's not something I've experimented with personally - I'm just extrapolating based on pure CPU clock speeds.

I have definitely encountered the issue that it's hard to draw pixels using many modern graphics API's, but I would have thought there was a way to make it fast.

I guess it depends on what you're drawing and what the primitives are too. Interesting topic, and another reason making a DAW must be a fun project!


I agree web stuff is really the best way to develop UIs. Good luck making responsive stuff in C++ for example. The paradigm of HTML, CSS, and JS is extremely powerful and even allows you to use canvas, webgpu, wasm.

There are multiple commercial projects that use web dev paradigm for GUIs:

https://coherent-labs.com/

https://ultralig.ht/

https://sciter.com/


I've done some spectrogram stuff in the past, and the bottleneck (eg in JUCE where you'd want to do this kind of thing) is in the path rendering. It's pretty cheap to compute a big FFT, even throwing it on separate thread if you care enough for that. Rendering an anti-aliased path that looks decent on a small display panel with the basic vector path tools however isn't great.

Not saying it can't be done, because people do it. But making something that compares to the latest/greatest and is fast isn't obvious.


> for building a UI I would actually personally choose something JavaScript-based

The author does address this at [0]. Their main arguments are that web-based stuff is slow and requires lots of memory, especially when making something which runs outside a web browser.

[0] https://billydm.github.io/blog/daw-frontend-development-stru...


Yeah so one tech stack I worked with was EFL with JavaScript bindings. That was fast and compact, even the JS stuff.


I'd think C++ and imgUI would be a good fit for this work but many other approaches could work fine.


> I worked for some time at a large public listed US company who used JavaScript for all their big UI projects

Could you elaborate?

Did you use a custom framework or maybe something like React or Svelte? Maybe pure vanilla?


What we need is a AAA game where the main character fiddles around on an in-game DAW in some abandoned recording studio to unlock a cool zombie-blasting weapon. That will guarantee that any widget redrawing therein happens as efficiently and responsive as possible.

Maybe put it in the first level so that audio engineers can easily get to it and just hang out in that part of the game to do their work, glitch free.


I do feel like there should be more crossover between video game & GUI development; most game engines are crossplatform nowadays, GUIs in games are nothing new (and probably reinvented all the time), and everything is GPU rendered already.

I've yet to see any desktop app that isn't a video game built in e.g. Unreal or Unity.

I did see a colleague build a website widget (iirc it was about placing furniture or a TV in a living room to see what it looks like) using Unity though, that was pretty cool.


I have seen a handful of desktop software apps built in Godot over the years. For example, https://bauxite.itch.io/bitmapflow is an app which I've gotten value out of built in Godot.


My favorite Godot app is Material Maker (https://www.materialmaker.org/). I downloaded it when Substance Designer was acquired by Adoboe. I'd say it actually has better usability than Designer.


Because Unity's UI old system (ugui) is actually not that good? From my experience using web+Electron is just so, so much easier.

By the way, SimCity's UI is done by a custom version of WebKit. https://twitter.com/MaxisScott/status/310835756107177984


> I've yet to see any desktop app that isn't a video game built in e.g. Unreal or Unity.

https://arkenforge.com/



There is a bunch of those in VR. Probably more fun than on a 2D screen in pseudo 3D.


While reading the original post, I kept thinking about Nanite


> This performance is not a problem if the app is small or the app already redraws every frame like a video game.

Seems to me like immediate mode UIs were dismissed rather too quickly. Why not update a DAW UI like a video game? It has similar realtime requirements.


Yeah, immediate mode seems like a perfect fit for a DAW. I see a lot of common misconceptions in the article. It seems like this is mostly an opinionated list of hypothetical issues imagined by the author, not something backed by a lot of data gathered from implementation experience.


Yes, I also agree that the author is wrong to dismiss immediate-mode toolkits. Immediate mode would be my first choice for a DAW application. I even would say it's almost a no-brainer as it makes so many things easier. You would need to be careful when rendering data-heavy widgets (ex: wavform) but the approach goes a long way with some careful planning even without necessarily implementing complex texture caching etc.


Immediate mode GUIs have significant limitations. You can probably lie to your end users by saying the API is immediate mode styled, but actually retained under the hood (and I think to some degree, this already happens today with cached state), but I think that defeats the point.


It's a trade off.

With ImGui you can definitely have a GUI that is:

  1. Very complex
  2. Able to use custom, complex widgets
  3. High performance
  4. Written very quickly with a very simple API
Those are significant pros.

There are a few serious downsides as well:

  1. It is not remotely conforming to the platform's standard UI (a bit less a problem for a DAW since they all have custom GUIs and widgets anyway)
  2. The font rendering is done by the GPU using cached textures, so it's more limited in terms of what you can achieve there (not so much a problem for DAWs because you don't have much text to display)
  3. Power usage can be high unless you use tricks to adjust the frame rate dynamically
  4. It's in C++, which can be a con for some people (although it's written in old-school C-like style)
If you can live with those, it's a great solution.

People do crazy stuff with Dear ImGui: https://github.com/ocornut/imgui/issues/5886

(And keep in mind that many of those are week-end / hobby projects, by people who prefer to focus on other things than the GUI of their projects.)


"Immediate mode" is a description of the API, not the implementation. It's not a lie to cache state under the hood. In fact it's sensible and doesn't defeat the point at all.


This is incoherent. The implementation is shaped by API limitations. Immediate mode APIs that hold data on the GPU are not immediate, they're by definition retained.


The people who actually make immediate mode GUIs disagree with you. https://github.com/ocornut/imgui/wiki/About-the-IMGUI-paradi...


ocornut's own bulletpoints are incoherent. He can call it whatever he wants, it doesn't mean it's meaningful.

The fact that he's designed a retained mode library with an immediate mode API doesn't mean that it doesn't have literally the same limitations, because he doesn't expose the abstractions to you to actually do meaningful retained work.

I don't care what made up paradigm someone else has invented.

Anyone who was doing immediate mode graphics programming before shader based pipelines knows exactly what it is, because you would have used the APIs by standards organizations themselves.


Yes, another immediate mode / retained hybrid GUI toolkit is NanoGUI. It uses NanoVG under the hood (which is used by a few DAWs already).


Not the coolest answer but I feel like QT is gonna be hard to beat here. It has a ton of this functionality already, signals are not the enemy for something this complicated and time-driven, it runs on anything and performance is great, and its pretty powerful/flexible when you need to whip up custom graphics stuff. QML and JS are really nice to have. I wrote a modular synthesizer GUI with it (https://github.com/ohmbre/ohmstudio.git). Took a gamble on QT5 because I wasn't a big fan of QT4 or below, and don't much like KDE, but it turned out perfect. Just never came across anything and said "ew well that's gonna be hard with the tools I have." There was always something handy. GUI never got in the way of audio performance, as long as its on its own thread and using ring buffers etc, you're fine.


I'm not QT or any other gui toolkit is well suited for this. Every widget is custom and many require high framerate / low latency.

It's probably best to treat it like a video game and just render everything every frame. Only use gui events to affect the underlying data model, don't trigger any gui updates or rendering from them. In this light you're looking for fast software or OpenGL rendering as the key feature in the toolkit, but you still want menus and toolbars.


Well, QT does use GPU pipeline for all its rendering, in every backend there is, including OpenGL.

Every widget in my app above is custom as well, it was meant for a raspberry pi and a tiny touch screen. If visually customizing an existing QT widget isn't enough, you just go one or two abstractions up their tree, inherit from there, and make it do what you want using not only custom code but you can grab from many smaller QT parts. For instance in my app you grab cables out of modules and plug them into other modules, which is not a common UI thing. You just grab some of QTs bezier shape rendering stuff and draw the cables with it, and it was trivial to add gravity and make the cables springy.

It's these tools and maturity that are going to end up making your life easier on a project like this. If you go with a young framework that exists just because some sassy guy is very opinionated about how you should be forced to manage things like state and mutability, you will end up having to ram things in, and mostly working from scratch.


> many require high framerate / low latency.

But that's the thing, unless you are _recording_ audio, you can cache and pre-generate the visualisation before hand. FFTs even on 24 192khz 24bit tracks isn't that taxing.

Even visualisations don't have to be that low latency. So long as the audio tracks are in sync you'll mostly be fine.

For waveforms you can literally side scroll the visualisation. You're not going to be pushing more than 60-75hz which gives you 13ms per frame at worst.


I think the art of creating own UIs was lost once Windows started dominating; prior to that every single game or app had to create their own GUI from scratch. It only somewhat survived in gaming.


I develop https://ossia.io with Qt (Widgets + QGraphicsScene + QRhi) and it works as I want it to


Which you can do with QT. It's a bit unwieldy, but it's performant and flexible.


>> Which you can do with QT. It's a bit unwieldy, but it's performant and flexible.

Not doubting that it can be done with QT, but I don't think QT is really optimal either. SDL is ideal for the rendering, but has none of the regular toolbars and such, and IIRC no multi-window support. I'd actually recommend GTK for this one, since it is lighter weight and the application doesn't need a ton of actual toolkit functionality.


I have a feeling desktop development is going to slowly revert back to classic OOP paradigms just like the web world detoured from MPA -> SPA -> SPA+SSR -> back to something that looks a lot like MPA again.

Classic toolkits like GTK, Qt, UIKit/AppKit, Swing and JavaFX use OOP to solve some of the problems this article talks about.

However, this OOP model seems to be somewhat incompatible with Rust's strict memory safety rules. At least I think it is because I haven't seen any big efforts for a Rust OOP toolkit like this. Most of the existing Rust toolkits are immediate mode or ECS or something "reactive" that looks like React.

While I understand the idea of moving away from "unsafe" languages like C/C++ to something like Rust, I wonder if Rust is the right language for building UI applications? Maybe a better architecture is to use Rust for the core logic as a library to be consumed by a higher level language that meshes well with these OOP widget tree and event handling paradigms.


I think we need other UI models instead of everything mature being object-oriented-oriented. Doing UI work in FP (or FRP) is great in many aspects until you need to integrate with these OO models like the DOM, et.al. Direct integration or a first-class VDOM-like model would be a nice step. There’s a tangent issue with all of popular game engines built around objects.


Even Lisp programmers decide to pragmatically use an OOP model when doing GUIs.

The idea of using FP for everything is about as sensible as the paradigm of using OOP for everything.


You say that like CL programmers are inherently functional, which has never been true. They have always embraced OOP when needed


Less, I would say; because using OOP for everything is straightforward, just obtuse, verbose and possibly slow.

You don't have to solve paper-worthy academic puzzles to use OOP everywhere.

Using OOP everywhere has been done and probably continues to in some shops. There was a time in the 1990's when nobody was fired for using OOP everywhere.

In Common Lisp, absolutely every value you can create has a class: the class-of function is meaningful for any value. You can replace all the defun forms in a Common Lisp program with defgeneric and defmethod.


What you're asking for is what Compose does (the new Android toolkit). Being pure FP though does have issues. There's no way to get a reference to anything on the screen, so some tasks that are basic and obvious in an OOP toolkit turn into a strange hack in Compose. For instance to focus a text edit you have to create and memoize a "focus requester", pass the "requester" as a parameter to the TextField function, then "launch an effect" and finally you can use the requester to request the focus.

Compare to an OOP toolkit: call edit.focus() when the UI is first displayed and you're done. The reason you need a requester in Compose is because everything is FP and lacks identity, even though there actually is object identity buried deep inside, it's just hidden from you.

The FP model does have some benefits, but I think OP is right and we'll end up with a less radical mixed approach.


Is there a practical functional model for GUIs?

If something doesn't exist, maybe the reason is "it's a poor fit".

If you're going to assert that FP is a better fit for GUIs, you need some demonstration of that, especially since it's been decades since this meme started.


> Ideally I want to support loading user-generated themes.

I really hope this just means "alternative colour palettes". Otherwise they're about to make one of the hardest UI programming tasks I can imagine exponentially harder.

I've never understood the passion for themeability. It's like going into a restaurant and expecting to participate in the cooking of your meal.

As much nostalgia as I have for the heyday of Winamp skins, most of them were awful and had terrible usability. It was a garish novelty akin to "Pimp My Ride"


It's like going into a restaurant and expecting to participate in the cooking of your meal.

Lots of people expect and do this all the time. I want this cooked like this, no this, this replaced with that etc. Hell, some restaurants just put out all the food they have on a big table and say "go ahead and put together your own meal with whatever you want, we don't care", and that is many peoples favourite restaurant experience.


Are you disagreeing with my views on theming or just my poor choice of metaphor? ;-)


I personally agree with you, but I know that I'm not necessarily in the majority. Much like many people seem to prefer buffet restaurants to restaurants with a fixed menu.


Parent obviously meant you'd be in the kitchen, cooking. Not barking orders.


On the other hand the Reaper DAW has colour and theme layouts and also has a very performant UI just like Winamp.

I guess you just need to be as good as Justin Frankel ;)


Naturally, Justin developed his own UI framework. Parts of it are open source: https://www.cockos.com/wdl/


Completely agree. I'd even go so far as to say that beyond pure accessibility and compat concerns (color blind modes or high contrast erc) even that is going too far with customization.


After decades of development the GUI space feels more chaotic than ever. There are several dimensions that can become critical bottleneck in advanced use cases (performance, platform independence, ease of development, library/tool availability and maturity etc) and - as this post highlights quite clearly - it is non-trivial to chart a path.

Some of the difficulty is definitely intrinsic to the problem space. Nobody said that arbitrary complex interfaces have a license to run at lightning speeds with minimal coding on arbitrary and/or minimal hardware.

But I do have the feeling that in part the difficulty is because the different approaches (both historical and current) have been effectively islands, self-contained with no standardization or separation of concerns. E.g. I would at least hope that we would by now have an accepted universal way to describe GUI's (declarative). I think this might be one reason for the popularity of web based approaches: while html/css was made for documents not GUI's, it is at least something everybody agrees on.


The problem is not really in the toolkits, it's in looking at how the performance emerges from the design.

Layout is the fundamental reason why GUI updates can be expensive: if you reduce it to a more static form, layout becomes cheaper, so recomputing it frequently becomes viable, and you can start caching more results without causing trouble.

What's the reason not to use static layout? Configuration. But configuration is something the user doesn't want to do a lot of either. If that's what they spend their time on in the GUI, they aren't making forward progress on the task.

So there's actually a self-alignment: a GUI that works well is one that can stay relatively "flat" in its performance profile, too. Trying to address this at toolkit level isn't abstract enough, because the things you do want to change, you most likely will see as either "this always updates every frame" or "this is a user action that triggers a full recompile". That isn't something that a general purpose framework can provide - it can't know that sometimes you have a window that needs to completely recompute its layout, and other times, it will never change or even move position. It's going to provide a vocabulary that favors one or the other, but you have to provide support when you need the more exceptional of the two.

IME I usually have done fine pushing the IMGUI approach farther. The problem it tends to face is with something like a scrolling list of elements of variable size: making that list automatically line up cleanly and efficiently is quite a lot of effort. But...do you need that design? Will fixed size elements work? Or pagination?

That's the kind of thing where it's tempting to say that you do need all the features, but it can quickly fall away from any specific use-case and turn into an aesthetic preference.


> But configuration is something the user doesn't want to do

I guess you don't make music, which is what this post is about. If people didn't want configuration, they would use Audacity and just record. How are you going to paginate a synths oscillator? Volume meters?


It's weird that the article mentions Makepad, but somehow fails to observe that Makepad's primary demo application is actually a software synthesizer. Go watch some of Rik Arends' talks! Most recently this one from RustNL last week: https://www.youtube.com/watch?v=9Q4yNlbfiYk (first talk of the day).


It's ok, we have been pretty quiet. Our primary launch goal is to have a full visual designer/IDE, not really push the UI framework stand-alone. However progress is happening and we are getting closer to that point as you can see in the talk above.


It's kind of funny how the author dismisses web technologies while complaining about redundant rerenders in Rust GUI libraries. This is all solved in browsers, both on the rendering engine and on the web framework library levels.

Yes you need to pay attention to performance, but the premise that "web is slow" has been disproven so many times already. Just because it's really easy to write apps that are slow (and thus these apps being ubiquitous) doesn't mean that you can't write performant apps if you are so motivated.


Alright, here's I want you to do. It'll be interesting and fun, you'll make something cool, it's free, and then you'll come back and look at your comment and reply with a "/s."

Download pirated copies of protools and melodyne, and a bunch of virtual instrument libraries from torrents. Download a midi file for a song you like, dump it into protools. Record yourself singing, right on the PC or headset mic - no need to hit the right notes. Dump the voice instrument and your voice recording into Melodyne and drag the voice to match the notes on the voalc tone track. Now set up a link from your Melodyne track into a protools track.

Then go wild, play around, apply filters, do fun stuff. You might get stuck in the fun for a week and it'll keep you awake late.

Once you're done, think about doing all that on a web page or a web page pretending to be a local app. Then reread your comment.

Oh by the way, you will need to install a new raw driver for your soundcard. You see, something like a 5ms audio latency you get from you normal windows sound driver, is too slow. Give some thought on how slow doing anything in JS is going to be, if a native hardware driver is too slow.


Until last year, I worked on a team that built an electron app for video conferencing with a double digit million user base. We did audio processing (mixing, RNN noise reduction) via WebAudio in WASM. This is in production right now as far as I know and adds very little latency.

That being said, this is about GUI. No reason not to do as much native processing as your heart desires outside of your GUI layer.


> This is in production right now as far as I know and adds very little latency.

You're not running 20 different tracks with compression, reverb, chorus, delay + send/return tracks + sub mixes in your app. Your app has absolutely nothing to do with a DAW. You're not taking MIDI input or running virtual instruments either. Try to do that in a browser with a latency < 10ms like Cubase or Logic and see how it goes.

Again, your product has absolutely nothing to do with a DAW and its constraints, so don't try to claim it's an achievement of some sort.


I'm not sure how any of this is relevant with regards to how you build your GUI. If you have special requirements that require you to do your audio processing in native code using low level system sound APIs then there is nothing preventing you from doing so while still building web based GUI on top of it.


I'm not sure how any of your video app doing some light sound processing is relevant to the performance required for a DAW at first place either.


I have used Cubase for 25 years. A video conferencing app is just not the same category even remotely.

This is like trying to build Maya or Blender in a browser without an art background but having to deal with ms latency.

I just don't see the point. I don't even like Logic because the Cubase workflow is transparent to me at this point.

Even the idea of grabbing pirated DAW software to get an intuition is kind of absurd if you have never made music at all. It is exactly the problem I have had with trying to learn Maya. I can't even draw something on paper. The idea a dev with no background in the domain is going to add something to software that has been evolving from very specific domain knowledge in a highly complex and unbound domain for multiple decades is utterly absurd.


I have not heard a single argument why DAW software can not have a web GUI. Do you have any? I'm genuinely curious.


There have been various toy DAWs the have web GUIs. I've never seen a production-capable DAW with a web GUI.

I suspect that existing web-as-native-GUI frameworks make presumptions about threading that won't work where a realtime audio callback thread piping data to other audio processing threads and back is the real show, and where the GUI may need things like updated zoom levels dirtying many waveform drawings at once requiring recalculation of peak values in viewable windows to draw the little lines that make the waveform, among other things that may make simple partitioning of systems more challenging...


A DAW is going to be doing far more audio processing on tens or hundreds of channels simultaneously, while playing back some of the audio from disk, some from memory, some from live inputs, and also receiving, processing, and recording/playing MIDI events. And hosting both the audio and GUIs of third party plugins.


That makes sense, but what does that have to do with how GUI is implemented?


DAW's GUI needs to remain responsive with minimal and consistent latency. If DAW users complained because of DPC latency introduced by Nvidia's video driver, you can imagine how critical low-latency requirements are.

DAW can use pretty much all of system resources (RAM, CPU, disk IO) during playback. If any other thread steals CPU for for too long during playback, like GC of an UI thread, this can introduce audio clicks/pops.

Video conferencing app has tons of network latency which helps to hide other latencies. That kind of latency would be unacceptable in a DAW. Video conferencing app also doesn't starve system of resources like DAW can, so it's UI can be more resource-intensive.


> This is in production right now as far as I know and adds very little latency.

Latency in a DAW isn't just at a premium, it's configurable and needs to be reported and compensated for. That means if any audio processing adds latency, all other processing units must compensate with their own internal (or external) delays to avoid the signal going out of phase.


What do you think happens with multiple audio and video streams in a video conferencing use case?

Apart from that, I don't see how any of this is relevant to how you build the GUI for a DAW.


> This is in production right now as far as I know and adds very little latency.

How much? As GP said, > 5ms is instant disqualification. And that is for your whole app, not just the effects pipeline.

GP is right, go use a DAW in a serious setting for a while to build up an intuition of the needs of these users.


How is audio processing relevant to how you build the GUI?


the article is about the _UI_ for a DAW, not the core engine. the UI isn't concerned with audio analysis or applying filters or raw drivers for soundcards. it's just a visual representation of state.


> but the premise that "web is slow" has been disproven so many times already.

No it is not. Though it seems proven that even with very powerful hardware users in general have been pushed to minimal expectations. So GUIs that take seconds to start or respond on multi Ghz processors and multi GB RAMs are considered fast now.

This argument is web GUI is fast is more of "It's been proven multiple times meal kits are cheaper than buying groceries from store". Yeah, someone can find really cheap meal kits if they spend many hours on hunting a deal. But in general case grocery in store is plain cheaper for most of reasonable shoppers.


Define fast. Do you care about frame rates? input latency? startup times? memory usage?

There are trade offs, but if you care about high frame rates and low input latency, web apps can totally deliver that.

Look at https://lighttracer.org/app.html, for example.


I tried that app expecting responsive app, but the viewport lags significantly, and when rendering, the whole UI lags.


I tried opening the link and did not even have the patience to wait until "compiling shaders" was done. When I saw the buttons slowly coming up one at a time I literally laughed. Is this supposed to be a remotely good example?


I don't know, it was something that I randomly picked that was really performant on my phone. YMMV.


you're still arguing about the usage of modern web technology, not its actual performance.

> No it is not. Though it seems proven that even with very powerful hardware users in general have been pushed to minimal expectations. So GUIs that take seconds to start or respond on multi Ghz processors and multi GB RAMs are considered fast now.

ok, so php is slow now? statically rendered sites have somehow become larger than they used to be?

the web is fast. just because some developers don't know how to leverage it properly doesn't make it less fast.


The amount of effort it takes to create high-performance applications in a web environment is significant, though, and the benefits you get simply working in a faster language in a desktop environment are lost on most web developers, I think, simply because most of them don't actually do any desktop or graphics API work.

They simply wouldn't know.

Also UI compositors are not a solved space. If they were, people would absolutely not being using Electron. But no one is saying, "Hey I have a CSS 2.1 compliant rasterizer and compositor that you can use in your C++ or Rust environment!" are they?


> But no one is saying, "Hey I have a CSS 2.1 compliant rasterizer and compositor that you can use in your C++ or Rust environment!" are they?

There’s actually quite a lot of interesting work going on in that general space, has been in various forms for some years. A couple that immediately spring to mind:

• Azul <https://azul.rs/> builds on WebRender, as used in Firefox. I haven’t looked at it for a few years, but it looks to have grown quite interesting now.

• Blitz <https://github.com/DioxusLabs/blitz> is based on from-scratch implementations of CSS layout and rendering, and wgpu rendering. It’s not usable yet, but is a very interesting concept. If one happens to be familiar with React Native: it’s kinda like that, or React Native Web.


React Native Web is a bit like scratching your left ear with your right arm :)


Actually, forget the React Native Web thing, I think I was mixing it up with something from a few years back that was more the other way round (running React web stuff on React Native, not running React Native on web).

React Native Web… eh, it has a potential place. Maybe like scratching your left ear with your right arm, because your left arm is partially paralysed, or something!


Right. The point of wanting to avoid redraws is for performance. Sure, you avoid redundant redraws in web browsers, but it defeats the point because the browser is slow compared to native, even with those avoided redraws.


Please, if someone has just that, let me know. It would be excellent for games. I do believe there are implementations of flexbox in rust.


Yeah, it's a problem space I'd like to try to tackle again some day. Yoga et al. aren't enough.

I think people are dying for a web-based compositor in other host language environments, but don't know how to elucidate this concept because they don't know anything else.


Technically, you could ship a minimal link of a JVM and use JavaFX. It has a compositor, a CSS 2.1 rasterizer, layout managers and renders to D3D or OpenGL surfaces. I think they're adding Metal at the moment.

The weakness is interop with C/C++. There's a new Java FFI in the works that makes that a lot better by auto-generating bindings from header files and avoiding JNI, but it hasn't shipped as stable yet. It's usable. You just have to accept that the API might change a bit.

You can also compile the whole thing to a native library that doesn't use a typical JVM at all. Memory usage and startup time is a lot better if you do that. You can also expose functions natively to C, or bind to C natively, using a different approach.

If I were to do use that approach I'd write a generic binding layer that lets you connect JavaFX observables to their C++ or Rust-side data structures. FX has an architecture that's very based on observable variables, which all have a uniform API, so if you can control those from the native side then you can do anything. Either updating state objects that are then in turn connected to UI objects, or directly updating UI.


i mean, people have done some pretty impressive stuff with React and threejs: https://codesandbox.io/s/i2160. i work in FE with 3D engineers with graphics and game development backgrounds who are at the forefront of the field, working on browser-based applications. we tried desktop apps and found the browser to be a better balance for performance, compatibility, and desired features (data streaming, webgl, React for turning incredibly complex state into 3D/UI) without being too low level. we can also deploy updates easily, without excessive build times, with little to no interruption to our users.

edit: another neat one: https://deck.gl/examples/trips-layer


> but the premise that "web is slow" has been disproven so many times already.

That's news to me. Do you have a link comparing local non-electron apps with local electron apps?

I mean, a gui chat app I wrote a while back using libpurple(or something with that name, could have been libpigeon), without even trying, started up and loaded all cached chats faster than I could measure at the time (under 10ms).

None of the current electron-based chat apps I use right now take less than 5s even on hardware (M1 MB) that is literally 15 years ahead.


Is latency of loading the app really relevant in the digital audio workstation use case? (no). As long as you get good frame rates while the GUI is running and latency is reasonably low, startup performance is not that important.


I was responding to this:

> but the premise that "web is slow" has been disproven so many times already

So, yeah, startup duration absolutely counts.

More to the point, even while using (for example) slack you still see +500ms latencies while doing things like typing

Native chat apps had instant (sub 10ms) responses.

But, you made the claim that web being slow has been disproven, which really is news to me, so I want to see benchmarks you are referring to.


Yeah. The OP should spend a day or two making HTML5 prototypes for the key aspects and see if it's really true that it's "slow".

The UI is not the hot code, it's the audio mixing. So long as the audio mising is faultless, it doesn't matter too much if it takes an extra video frame to update the dB display.

Because there are thousands of people working to maintain the code in the browser stack every day that are on his side then.


I can assure you that users complain when the GUIs of DAWs start to run below 30fps, and really the expectation these days is that "performant" GUIs run at 60fps.


I am suggesting confirming if it is actually "slow" for the "key aspects".

You and the OP are just assuming it can't do whatever it is at "30fps"... if you're right, it will only take ten minutes if it's so bad, or an hour or two if it's subtle, to actually confirm whatever the worst thing is.

The "extra frame" I was thinking of is graphics pipeline latency, not redraw time.


Well now you're assuming what I'm assuming, which is fair enough because I assumed by your comment that you were saying the UI responsiveness isn't really a concern in DAWs. It's definitely a secondary concern, but people are accustomed to buttery smooth UIs and complain when they're not.

That said, I'd argue that even latency is still a concern for things like meters. If they are noticeably out of sync with what you hear then users will again complain.


There's a relatively new C++ GUI library literally called "Elements". Not sure how it works though, but the way it looks, and the music background of its creator makes it appear that DAW applications might have at least been considered.

https://github.com/cycfi/elements

Edit: It does mention VST and AU in the introduction:

> "It should not own the event loop and should be able to co-exist with components within a plugin host such as VST and AU."


I can't recommend a toolkit or a set of widgets because ages ago I came to the conclusion that there's no way to use traditional ones for a good DAW interface, and something very specific to the task should be developed, which is a pain in the ass. For spectrograms, I could suggest a look at Jaaa's source. It draws directly using wrappers on X11 libraries and it's fast and efficient also on low end hardware.

https://wiki.linuxaudio.org/apps/all/jaaa

http://kokkinizita.linuxaudio.org/linuxaudio/


DAW is so challeging! But in terms of audio effect plugins, I think EGUI is usable already. Here is a Dattorro reverb VST plugin written in Rust with egui and glicol_synth:

https://youtu.be/DLFO4dXzKsg

Since EGUI can be used almost every where, I also made an experimental front end for Glicol music language with EGUI: https://glicol-egui.netlify.app/

There are many features missing, compared with HTML/CSS: https://glicol.org/

The biggest issue with EGUI is that the style seems to be quite fixed. The default style looks great for some customisation would be more ideal in some context.

although we are not there yet, it's great to see more and more attempts for audio /gui in rust. And I really feel that Rust is the best choice for the audio backend now.


I don't think you can meaningfully use immediate-mode GUIs for anything related to compositing work.

If you're creating debug UIs or basic UIs at best, you're fine. That might actually work out alright for DAWs, but if you want sophisticated user-interfaces where you can add effects to things, you need to start retaining graphics data, and so many people advocating for these old school immediate-mode GUI libraries just don't get that.


Immediate mode gui - psuedocode where volumeSlider is a slider used for setting the audio volume and spectrograph includes a bar which displays the volume visually:

function volumeSlider_onChange() { audioEngine.setVolume(volumeSlider.value); spectrograph.setVolume(volumeSlider.value); }

function stopButton_onClick() { audioEngine.getCurrentTrack().stop(); }

function audioEngine_statusCallback(Track track) { spectrograph.update(track.status); }

To do something like this in a declarative UI, you will need some sort of a state machine, that state machine will then need to have a bunch of state variables, callbacks/hooks, providers/notifiers etc. And you have to have this managed for each variable.

That is, there will be a LOT more code that needs to be run before updates flow from the GUI to the engine or from the engine back to the GUI.

Why would you think that declarative UI will be more efficient?


Can you explain this point a bit more? I've always hated retained mode GUI systems and im-guis seem so much nicer to work with. But I've not made something huge with them. What do you mean by compositing in this context?


Not beautiful. That is not something that makes you want to use egui.

(My plugins: https://github.com/rerdavies/ToobAmp using a web interface; I'm currently weighing options for GUI backend for a Linux native interface. LV2 plugins infamously cannot use GTK or QT). Current leading choices: cairo2, or embedding a browser control.


I'd be concerned about the lack of garbage collection in Rust.

It's little appreciated what a breakthrough garbage collection was for code reuse: it was probably more important for code reuse in Java than object-orientation itself.

If you don't have garbage collection the API between a library and an application has to address memory allocation and freeing. If the relationship between the library and application is simple this is not a problem, but if library needs to share buffers with the application and particularly if the library constructs complex data structures that are connected with pointers, it is a difficult problem because often the library is not sure if the application is done with a buffer or vice versa... But the garbage collector is!

(Ironically this is a case where "abstraction", "encapsulation" and such are dangerous ideas because the problem the garbage collector addresses is a global problem, not a local problem. As a global problem can be correctly addressed globally, this is an appropriate use of "abstraction", but if your doctrine is that "abstractions are local" you have baked failure into your mental model at the very beginning.)

It's not accidental that OO languages became popular when GUIs did because OO is a good fit for the GUI domain. Memory allocation matters a lot, particularly if you are building complex applications for creatives. If you're just writing mobile apps and overcomplicated web forms, you can say that "the panel belongs to the form", "the input box belongs to the panel", ... The idea of borrowing becomes highly problematic when you have complex applications where the user can add and remove property sheets and other controls to the UI because you now have a graph relationship between components, not a tree relationship. If you have garbage collection and other tools appropriate to graph relationships you can make it look easy, approach it with the wrong tools and you will always be pushing bubbles around under the rug.


Audio is literally a major domain where GC is a big no-go... (even a GC running in another GUI thread can be an issue if it's a stop-the-world operation)

You won't find a single audio backend in a GC language.

GC is obviated by Rust anyway (it's native RAII pattern involves "ownership" which allows "borrowing", things are immutable by default...) and there are ARC and Box implementations (check out Box particularly)

You could always slap something like Boehm on, lol...

These things need to be solved by the Rust community, regardless, so it's full-steam ahead, and GC is not something tenable when there are ways to control the timing of resource freeing that won't glitch audio playback, which GC would do.

GC is for when you have no idea about the types and lifetimes of the objects you create.... when you know these things, why wouldn't you setup proper shrink-wrapped alloc/de-alloc in a time-coordinated fashion?


In a dynamic UI you don’t know these lifetimes because they are up to the user.


Well, in the sense of "lifetimes" that I mean in terms of ownership, scope, and destruction, yes we do know the lifetime (removing a plugin-instance and de-allocation in an orderly fashion is part of a plugins API, for example)


This post really makes me appreciate the engineering that led to the completely usable and powerful DAWs that ran on the comparatively weak hardware of the late 90s, early 2000s like Cakewalk, Cubase, Logic.


and without GPUs, mind you. And a lot of that trickery probably came down to: let's write everything in assembly. Fruity Loops was absurdly fast in the early 2000s and was written in Delphi and assembly, IIRC.

In 2023 DAWs should not be that difficult to make fast. It's a solved problem. Multi-channel audio processing is one thing that modern CPUs should not even break a sweat at. I believe this article has uncovered the basics that have made every DAW complex from the beginning, but seems to be focused on efficiency with your bog standard off-the-shell GUI library. That's the mistake, IMO. None of the DAWs I know of use, for example, standard GTK. Even the ones that do use a GUI toolkit use it minimally, for the file pickers and configuration. Not as the primary interface for the DAW itself. You need to go lower level and remove the layers of cruft. I mean hell, adding language bindings on top of everything is going to add yet another layer of inefficiency.


Oh yes, thanks for the reminder about Fruity Loops, another great piece of software from that era.

Not a DAW, but I really miss CoolEdit


Reaper, let's not forget Reaper. Fully blown DAW with brilliant UI in like 15 MB. Cross-platform too.


Absolutely, such a great project! I understand one of the guys behind the project is doing it more or less out of passion as he made his money from WinAmp back in the day?


oh hey this is a topic i know of somewhat by chance :) i recently challenged myself to make a daw in the fewest lines of code and i came up with a cli based tracker that essentially does this:

- load track.csv

- each column corresponds to 1.wav, 2.wav, 3.wav, 4.wav etc

- for each line, parse the cell of each column for characteristics and play the sound according to those characteristics (i just did on/off with an x, but you could do volume or duration)

- delay before the next line for BPM

what i ended up was a fun little toy csv-tracker. i was convinced it would be too stupid to be fun beyond the task of making it, but as soon as i saw the 'notes' scrolling line by line.. ooo-weeee what a rush of nostalgia 10/10 would recommend as a fun exercise.

Edit: to actually respond to the article though, because this sounded a bit like storytime, since the author was asking (possibly rhetorically), i think the author -should- stick with Rust. im not saying this due to any fanboy tendencies, 90%+ of what i write is in Go, including the tracker I mentioned above.

what makes me say that Rust is a good fit is because though it is difficult to produce code at a high velocity and with more simplicity in Rust, audio is one of the problem domains where realtime computation is a must, and Rust as compared to Go for example would excel in capable hands due to the way Rust manages resources and leans more toward realtime tasks. We're really splitting hairs here, but be that as it may, i know that it wouldn't take long to be thinking about tweaking the GC if i were doing a 'serious' daw in Go.

evidently i was not the first with this idea https://www.youtube.com/watch?v=RFdCM2kHL64


The article claims that Druid doesn't use damage tracking for rendering, but this is not true, or at least not reasonably so. Druid does track damage regions and pipes this all the way to the OS. The only exception is macOS, where the regions are still tracked but not fully connected to the OS. There is an abandoned PR that even adds the connection.

That said, not using Druid for a DAW project is a fine choice. Doubly so for a new project, as it was correctly pointed out in the article that Druid is now in maintenance mode.


I wonder (and admire) what DaVinvi Resolve is using for it's UI.

It seems very polished but is also very performant and is even cross platform.


They're using Qt. All the Qt libraries are there inside the app's package on macOS.


Amazing. Didn't know Qt is so capable in the right hands. Doesn't even look like typical Qt app.


I don't know much (or anything, really) about DAW development, but as a Reaper user I know you can make a cross-platform DAW that performs extremely well on very little ressources, including displaying many instances of all the things the author is talking about (spectrograms, audio files, zoomable midi clips, etc.) in real time, on a Raspberry Pi, while playing audio and driving physical synthesis plugins.


True! Fun fact: Reaper uses its own UI framework. Parts of it are open source: https://www.cockos.com/wdl/


It’s a hard problem, and I don’t think there is a single solution for a DAW UI. On the one hand there are a lot of UI elements that can be built with traditional widget kits, like the toolbars, panels, even knobs and sliders just custom widgets, you can even build a channel strip that way.

On the other hand there are the “contents” or “views” like track view, clips, automation, piano roll, eq etc. I believe those are best to be handled as 2D scene graphs with direct GPU rendering. Even then, each of those DAW specific views presents their own set of challenges which are very involved to address.

So I would definitely choose a toolkit that allows both traditional widget trees as well as scene graph style rendering (for eg Egui + wgpu in rust, or Qml in C++).

There are other challenges too, I’ve always found synchronizing the ui, the rendering and the audio thread so that everything feels responsive is very hard.


Isn't Bitwig using a C++ backend with a Java frontend? It it were me I'd probably choose between biting the bullet and going all rust, or if I was planning on interop anyway, then go for something a bit more high level like AvaloniaUI (C#) or a thin java solution on top like Bitwig or the Jetbrains platforms.


Yes they are.


The writer of the blogpost here.

The popularity of this post has really caught me at an awkward time. I want to clarify some things: https://billydm.github.io/blog/clarifying-some-things/


If you are interested in some help with the GUI, I can help


The author should look at how EDA (electronic design automation) tools are built. They have many of the same challenges but they also have a lot of solutions


For 5 years or so, I'm watching for GUI infrastructure for my imaginary audio applications. And recently I started to develop an audio application in Rust. What I learned is that there is no a single best library in any language. Also, you shouldn't rely on anything as if it even a very popular thing at the moment it can become unmaintained very quickly. So it's better to choose a satisfying solution in what available currently and abstract it as much as possible to be able to switch to something else in the future. I also did a very simple benchmarks for GUI libraries I was interested in (instantiating 10 000 rectangles, styling them and doing layout) and the leader on macOS was the browser (Tauri) with Slint at the second place. But, unfortunately, Tauri is unusable for such kind of application because of IPC. If you need to visualize your audio data, you need to continuously send it to GUI. The fastest way of communication for that would only be WebRTC, because of the sandboxing. Or you would need to go all WASM, but then you don't have access to a lot of other features you needed. As a suggestion, take another look at Qt. At very least MuseScore and Ardour use it. Also, at my opinion the most mature Rust GUI library yet is egui, not Iced. FYI, Audulus uses immediate mode GUI based on nanovg (https://github.com/audulus/vger#why). And they develop their own Rust GUI library BTW (https://github.com/audulus/rui)


> As a suggestion, take another look at Qt. At very least MuseScore and Ardour use it.

Ardour uses GTK 2, and only minimally at that: https://discourse.ardour.org/t/gtk-2-has-been-deprecated-is-...


Maybe this is the way forward: build a Rust SDK which does all the high-performance heavy lifting. On top of it you create bindings (using uniffi) to your favourite front-end language. And you create a frontend focusing just on the front-end stuff.

This was presented at FOSDEM 2023 by Matthew Hodgson on building Matrix 2.0. 15 minutes into the video Matthew explains the redesign of the Matrix clients for mobile. His conclusion: maybe this is ultimate stack to build mobile applications.

https://fosdem.org/2023/schedule/event/matrix20/


I've built a modular midi / audio / video platform on the web, and encountered most of the pain points mentioned in the article.

https://sequencer.party

Some points:

- you need to move all sequencing and audio / midi work off the main thread. I use WebAudioModules and sequence all midi in the audio thread as well.

- you need to move all other heavy lifting off the main thread, for example for the multiplayer features I use YJS.. YJS runs in it's own worker to not stall the main thread

- you don't update the DOM, you very carefully redraw parts of SVGs or better, canvases.


I actually know this space fairly well, probably better than most, because being an implementor of a UI compositor raises some difficult technical challenges with respect to UI tree backing layers and draw invalidation.

I'm still not actually sure what the state of the art on these matters is after having researched this for years, and I suspect even what we do today across software like web browsers, video game UI is wildly inefficient for modern rich user-interface requirements.

For instance, you could tailor GUI software to strictly draw to specific frame buffer/backing layer regions and you could manually invalidate and redraw when the developer specified, but that's generally far too implementation specific for most UI developers to actually follow. A master UI implementor wouldn't have too much trouble with this, but most people don't have a decade of graphics API level experience on top of designing world-class user-interfaces.

So you have some ambiguous UI tree you need to draw and redraw when things move or animate. Today's state of the art is to multithread draw the space to tiles using virtual graphics commands that later get converted into target graphic API calls and those tiles translate to the corresponding graphics API backing layer abstraction, some sort of frame buffer.

When the UI tree changes state, those leaves are invalidated all the way up/down the tree to the root at which point the compositor redraws what's necessary and combines the layers into what gets displayed on the screen.

There's a bit of work to separate disparate UI element bounds from layers which require acceleration, but those nodes are flagged for the requirement usually by explicit properties.

This is really complex work, and actually I think only web browsers today do it. I'm not sure even operating systems do, because usually they don't operate with the same challenges that web browsers do where you have total UI flexibility, or as many people using the platform as web developers.

But it's also the only platform that has this architecture.

So if you're building desktop software, you can try to emulate this architecture and end up with some fairly good performance. Even just using nested frame buffers gets you really far. But if you're writing front-ends on the web, you're stuck between the browser's compositor or rejecting it entirely and writing your own stack in a graphics layer that gets pushed to a canvas.

None of this is good, it's all trade-offs, and I suspect the state of affairs will continue for years, because we don't have enough people doing compositing work. It's exceptionally specialized work.

As an addendum, the DOM isn't slow. This is just parroting. It's total crap. People who say this have no idea what they're talking about. What's slow is everything GPU related. That's not even the CSSOM, either. It's specifically the part where pixels get rasterized. But you don't have lots of people who work on compositors telling you that. You have a bunch of speculators thinking the DOM is slow without actually measuring anything.

Painting and drawing to backing layers is super expensive. Pixel fill is expensive. Changing document subtree node properties is not.

You absolutely need efficient invalidation in UI drawing. I have no idea what this author is talking about.


GTK 4, which the author mentions in their post, uses a retained rendering model so that the UI can be rendered on GPU. Widgets snapshot render nodes which are immutable and can even be cached between frames. Those render nodes are diffed to calculate the damage region automatically.

That damage is then used to scissor clip in GL before the render tree is converted to shader commands. The original damage flows through to eglSwapBuffersWithDamage() to limit the screen damage to the diffed areas. Some work is needed to do tiled/threaded processing of the render tree to reduce pathological cases instead of large scissor clips. Given how much a DAW changes per-frame when animating, it's probably close to pathological anyway.

In git, there is support for something akin to swap-chains so that you can use DMA-BUF textures (or similar, such as IOSurface), and specify the damage between each buffer. This is useful for things like virtual machines or other external plumbing where you have a buffer+damage to integrate into GTK's composition pipeline and don't want full-frame renders to result due to gsk_render_node_diff() falling over.


Thank you for sharing this. I'm not familiar with how GTK renders under the hood as compared to other pieces of software.


> I'm not sure even operating systems do

Oddly enough Windows used to be a lot better at what you describe than it is (in practice) today. I've used old systems where it took 0.5sec to clear the screen, and you could see the color sweep down the monitor from top to bottom. Now THAT is slow. Because of that the OS used a lot of tricks to minimize drawing, such as xoring the caret, and precisely tracking dirty regions to minimize the number of pixels drawn, etc.

Much of that finesse has been lost in this new generation of GUIs built on 3D APIs where the entire window is thrown away and redrawn every frame.


> Much of that finesse has been lost in this new generation of GUIs built on 3D APIs where the entire window is thrown away and redrawn every frame.

GTK 4 uses eglSwapBuffersWithDamage() scissor clipped to the union of the damage rectangles. Operations falling outside that clip are culled.

Only the damage areas are composited, assuming a competent compositor.


Which is why on modern Windows one would use the compositor APIs as well, however that implies some COM fun, which could be great if WinDev wasn't stuck in their ways on how to provide tooling for it.


I think you're right. It used to be practice to limit your invalidation rect/region based on known bound information. Now every developer treats their entire buffer as a cheap throwaway and they couldn't be further from the truth.


It's still done. JavaFX does dirty region tracking for example, e.g. here

https://github.com/openjdk/jfx/blob/master/modules/javafx.gr...

and that's a toolkit that's still relatively modern (it was ahead of its time). For example the UI thread is separated from the rendering thread, so one core can be computing the layout/processing events/doing animations for one frame whilst another core renders another frame to a GL/D3D command stream.


> Pixel fill is expensive.

And how. People don't realize that even with modern graphics card bandwidths, windows that are a good fraction of a 4K monitor take a remarkable amount of bandwidth to clear and fill.


Thank you! Man, if you could get people who work on UIs to write a shader once in their software career and tell them, hey how do you get a nice Gaussian blur to draw fast on a 4/5/8k monitor, they'll quickly realize you need to do as little drawing as possible.


Gaussian blur is "easy", you can decompose it into a vertical and horizontal component and parallelize and vectorize most of the task.

But that's a special case. If you have to actually fill the buffer the traditional way, it takes an eternity.

Which is why I'm so disappointed by Gtk's decision to say "just draw at 6K at 2x if you want 4K at 1.5x, scaling is cheap". Because it's definitely not.


They'd probably answer: you need to be using 2D FFT convolution. ;-P


When I was searching for a very fast Gaussian blur algorithm to implement into my 2d canvas library, I stumbled across some code[1] which does a magnificent job by using something called an Infinite Impulse Response[2] filter - something that comes from the world of audio, I believe?

The best bit from my POV is that the work is all done on the CPU with no need to care about webGL/shaders/etc.

[1] - https://github.com/nodeca/glur/blob/master/index.js

[2] - https://en.wikipedia.org/wiki/Infinite_impulse_response


Is there somewhere that has a good guide to understanding why this is the case? I'd love to learn more about it. I imagine there's a lot more to it than a gpu's pixel fill rate, which from the marketing should be sufficient but obviously from this discussion is not.


You really don't feel this unless you do graphics programming and then it hits you like a ton of bricks.

If you want a beginner's introduction to pixel fill impact, and say, you're a web developer, try picking up a friendly graphics framework like Phaser, LÖVE, or something comparable and writing a shader that's applied to a fullscreen framebuffer, and measure the frametime that it takes to simply draw that as compared to a texture without a "default" fragment shader.


That really doesn't explain the why though. Why is pixel fill so slow? You'd expect the opposite given the numbers on graphics cards. What's the disconnect here?


Honestly, it's not "slow", it's just there are a LOT of pixels.

Number of pixels goes up roughly as the square of resolution. A 4K monitor is 4x the pixels of a 1080p monitor. 8K is 16x the pixels of a 1080p monitor.

My current browser window is probably more pixels than 2 1080p monitors.

Doing the math: 1920x1080x2x4bytes(RGBA) = roughly 16 megabytes

On a 3070 (a quite solid graphics card) you get roughly 450GB/s of memory bandwidth. That's if everything is up, running, powered, and in order.

You wind up with 16MB/450000MB/s = 35uS for a simple window clear. Nothing clever just a pure write clear. That's 3+ orders of magnitude off from CPU operations.

If you have to do a blend op, you have 3x--read byte, operate on byte with other data and write back. You've just crossed 100uS for the simplest blend--if you get all the memory interleaved correctly AND nothing has to leave the graphics card (if you have to transfer between the card and CPU you can kiss your time budget goodbye).

It's one of the reasons why "compositing" is such a PITA. And, to be honest, I suspect compositing window managers are a dead end. Do we really need animated window opening and closing effects? Do windows really need to be transparent so you can see something behind? Do we really need greyscale, blended drop shadows on the edges? Do we really need to render the contents of the window while resizing it? The mobile space answers "Oh, hell, no." The tiling window manager users also concur.

8K is just going to make this kind of stuff disastrously obvious.


From your description I think that .NET WPF[0] has the same architecture?

[0]: https://learn.microsoft.com/en-us/dotnet/desktop/wpf/overvie...


I don't doubt they do something similar, or perhaps they inherent Win32 invalidation somehow, or build on top of it.


For the audio plugin side of things, I wrote a bit about the kinds of optimizations we do: https://dplug.org/tutorials/Dplug%20Tutorials%2011%20-%20UI%... People have come to expect much more UI feedback than before, and this has a CPU cost.


There’s also the option of dumping the cross-platform requirement. Which platform is most important to the author? Then narrow down further by deciding on which configuration of that platform matters. Define those things, then build the best DAW possible.

In my mind, this doesn’t rule out web-based stuff because I view the web as its own distinct platform.


Tom Ellard from industrial band Severed Heads wrote "A Completely Biased Guide to Digital Audio Workstations", which has some pointed comments on DAW interfaces:

https://nilamox.com/a-completely-biased-guide-to-daws/


Its weird to see the impressions of someone who has apparently never used the equipment used in music production trying to talk intelligently about said devices. So many incorrect assumptions its really painful to read as someone aquainted with audio engineering. Interesting window into how the devs see it nonetheless.


You should comment on the specific issues you see in the article.


Huh? The author wasn’t talking about audio engineering, so I don’t see how that is relevant. The GUI programming issues they’re discussing would be true for any realtime application. The fact they are building a DAW is a bit beside the point otherwise.


I find the article actually refreshingly dry. The author acknowledges and breaks down a large series of interface requirements for a music production tool and approaches it from a pure development point of view.

I wonder what are the "many incorrect assumptions" you see. I didn't notice any attempt to judge or discredit any of the requirements, he merely described them in a very sober way.

For someone very familiar with audio equipment it was a nice read for me. It's like reading a summary of user expectations to switches and knobs in a car, from a pure GUI development point of view.


the comments about the layout of “mixing tracks” not following design guidelines or being a standard list/tree/etc struck me.

in traditional audio engineering we call them channel strips and their layout reflects the signal flow - eq knobs are typically at the top, followed by compression, fx, and so on.


Naming conventions are a mixed bag (hehe) in DAWs, since not everyone who dabbles in them is an audio engineer by trade.

- Ableton calls them "mixer controls": https://www.ableton.com/en/manual/mixing/

- Bitwig's manual does mention channel strips: https://www.bitwig.com/userguide/latest/the_mix_view/#channe...

- Adobe Audition has "track controls": https://helpx.adobe.com/audition/using/multitrack-editor-ove...

- Ardour... probably has mixing board tape and felt pen marker emulation as well, I'm late and couldn't be bothered to look up its manual.


I think he meant that they don’t follow typical design guidelines for graphical user interfaces on a computer. Which they don’t — outside DAWs (and maybe some image editors?), I’ve never seen a GUI arranged like a channel strip, where controls are stacked on top of each other but not necessarily arranged in a perfect grid. A set of controls like e.g. [0] in Ableton is highly non-standard, compared to the kinds of layouts most GUI toolkits provide.

[0] https://rekkerd.org/img/201210/ableton_live9.png


It seems like the level of experience of the average poster here has been decreasing steadily over time.


I'm sure there's a clever term for it, but a lot of developers (and this is probably a timeless thing) think they can dive into any industry and make a difference, or see things from a different point of view and therefore better.

I mean it has a basis in reality, loads of people here will have joined companies that were years behind, companies in new industries. Myself I've gone from cable TV to investment banking to public transit to B2B e-commerce to telcoms to the energy industry, but nowhere did I assume to think I knew better. Not when it came to their main industry anyway, software development and practices, sure, that's kind of my jam.

When it comes down to it, every job is just data in, data out, localization and date / number formatting. And trying to get my colleagues to not overcomplicate things, like come on, it's just a simple list view, if you're bored by that you're on the wrong assignment.


I have recently been using threejs (as in React-three-fiber). In the process I found myself wondering why I have been using html/css/react all this time. And if you think that is a dumb thought, let me tell you I do too. But it just won't go away.

I do understand the useFrame() issue, but still ...


I'm curious as to why the author thinks that the fltk-rs rust bindings to fltk are "practically unusable." I've been using them and while there is a definite learning curve they seem complete, stable, and performant. Also the author is extremely helpful and responsive!


My recommendation: try Delphi/Lazarus. It is trivial to develop new GUI components in Delphi/Lazarus and there are already very good GUI component libraries for Lazarus available as packs.


For the relative mouse movements, can't you just move the (hidden) mouse pointer to the centre of the screen every time you get a movement event? It will be very hard to hit the edge that way.


What about using the graphic bits from WDL? https://www.cockos.com/wdl/


Building user experiences is inherently some of the most difficult software engineering there is.

It's not difficult because the existing tools are poorly made (though most of them are), it's difficult because the core problem is extremely complex (in the Rich Hickey [1] sense of the word, e.g., braided, intertwined, complected, etc.).

In order to build a user experience, we start with some information that can often (though certainly not always) appear both simple and easy (i.e., unbraided and comprehensible). But then we must create a projection of that information onto a 2D (or 3D) plane and manage it's changing state over time.

This is where everything falls apart. We're usually trying to manage a very wide, very deep, loosely defined tree of state and we're trying to reflect an arbitrary number of user interactions and mutations instantly, and over time.

Exactly zero of our tools (programming languages, UI toolkits, Operating Systems, etc.) have strong support for managing change-over-time in a way that makes this type of problem fast, stable or enjoyable to present, use, verify or mutate.

I love that the OP showed the triangle of Performance, Adaptability and Velocity. I've been framing a similar concept with teams since 2011 which I refer to as, "Fast, Stable and Delightful."

It's my thesis that we need to balance these three requirements in both our activities and in the output that we produce. This balance is incredibly difficult to achieve and harder still to maintain.

Different organizations tend to focus internal power in one of these 3 nodes (e.g., Google on Stable, Facebook on Moving Fast, and Apple on Delight). Note that the desire to move "Fast," or even to be "Stable" rarely achieves the desired result.

The only organization I've seen manage these values effectively is Apple and I believe this is because people there have understood for decades that speed and stability are prerequisites for delight, but that neither speed, nor stability (alone or together) are sufficient to make something that gives humans the sense that someone, somewhere actually cares about how they feel.

I love how the OP is exploring this problem space and how they are digging into the existing solutions and trying to define the problem with some rigor. This is fertile ground for exploring new ideas and new approaches. Please do not discourage this activity!

[1] Simple Made Easy: https://www.youtube.com/watch?v=SxdOUGdseq4


The DAW the post refers to: https://meadowlark.app



You would like to use a canvas element for custom ui. A canvas is a hardware accelerated 2d library (for example Skia) where you call functions to draw lines and curves. To make a decibel gauge you would have the background face image in a buffer, then only redraw the hand line on top of it for each frame.


Title should be changed to Rust GUI Development Struggles.


Should have gone console-based, keyboard controlled


Some comments on the issues mentioned at the beginning of the article as I had the same questions in https://ossia.io:

- DAWs have a lot of toolbars and panels (browser panel, timeline panel, piano roll panel, fx rack panel, mixer panel, audio editor panel, automation editor panel, settings panel, etc.).

for this Qt always performed brillantly for me. I went from QDockWidget to QSplitter-based layout though.

- Some widgets like decibel meters and other visualizers are constantly being animated, meaning the GUI library needs to efficiently redraw the screen every frame.

all the CPU-based toolkits will only redraw what changed though

- In addition, visualizers can be expensive to render on the CPU (especially spectrograms/spectrometers). Ideally you should use custom shaders to render them on the GPU.

fair, but then you may pay the cost of a GPU -> CPU transfer which isn't always free

- Clips on the timeline are notoriously expensive to render. There needs to be some way to cache the contents of clips into a texture (Either directly or by making use of the GUI library's "damage tracking" which I'll get into later.) Audio clips are the biggest culprit, because rendering waveforms requires the CPU to first do a linear search through the source material for peak values, and then render the waveform pixel-by-pixel (or even better use custom shaders to send commands to the GPU).

AFAIK a lot of DAWs perform a full scan when the file is loaded and save the result in a database or in a file next to it (e.g. Reaper's .reapeaks, Ableton's .asd...) so that you don't need to perform a complete rescan. In ossia.io I use three different algorithms / display methods depending on the zoom level used: at "far away" zoom it uses the minmax of the audio slices, at intermediary zoom it draws lines and when getting closer, it starts drawing individual samples.

https://github.com/ossia/score/blob/master/src/plugins/score...

- Automation clips can contain a bunch of bezier curves, which are slow to render.

Convert those to line segments with an approximation setting that looks good enough and it'll be ten times faster (keep the bezier for your data model of course).

https://github.com/ossia/score/blob/master/src/plugins/score...

https://github.com/ossia/score/blob/master/src/plugins/score...

- Piano roll clips can contain lots of little rectangles in order to display a "minimap" of the MIDI notes inside of it.

oh damn yes, I spent so much time on this and it still needs so much optimizing... if someone wants to give a shot at it :D

https://github.com/ossia/score/blob/master/src/plugins/score...

https://github.com/ossia/score/blob/master/src/plugins/score...

- On top of all this, clips can contain text labels which can also be expensive to render.

Yep, made myself a few "cached text" Qt items over time as the builtin cache wasn't satisfactory

https://github.com/ossia/score/blob/master/src/lib/score/gra...

- The fact that a timeline is zoom-able also makes it harder to cache the rendering of clips. If the timeline changed its zoom level, all visible clips pretty much have to redraw all of their contents.

yep

- Piano rolls can also be expensive to render if there is a bunch of MIDI notes, especially if there are text labels on the notes.

yep

- If the user clicks on a folder in a sample browser containing hundreds or even thousands of files, allocating a label widget for each file in the browser list will be very expensive. Something like the list factory in GTK is needed here.

yep, Qt's also able to cache this. Though for instance for Qt's QFileSystemModel I carry a small patch to disable any kind of sorting when there's more than a few hundred thousand files (which happens for large media libraries): https://github.com/jcelerier/qtbase/commit/9909c3c7902cf7a2b... and also worked a bit with upstream Qt to get it to improve (for instance it was regenerating regexes ALL THE TIME when filtering for specific file extensions)

- We want to reserve as much CPU as possible for the actual audio processing. Ideally the GUI shouldn't take up more than one or two CPU threads. - On some platforms, we also need to make sure there's actually enough CPU left for 3rd-party plugins to render their GUIs.

yep


You could just buy Ableton for $500 and save yourself the dev hassle. It’s already a solved problem. Why reinvent the wheel?


What a toxic and stupid comment. Hey guys Windows already exists, why build another operating system? /s


Yeah but it's a rule of programming laziness and hubris as outlined by ESR and Larry Wall, that you don't need to keep inventing tools that already exist. There's better things to programme. This dev isn't going to build a better Ableton on his own, and this site now simply lists all the reasons he can't, not problems to solve on his own. It will take magnitudes more than $500 of his time to replicate anything close to Ableton's functionality. I wasn't being toxic at all, just pragmatic. Pragmatism is part of knowing when to programme, and when not to.


Right, so like I said in my previous comment - you are the equivalent of someone telling Linus Torvalds "Why bother creating another operating system? MS-DOS already exists".

As someone who's used various DAWs extensively, Ableton Live most certainly is not perfect. It has a terrible piano roll (see FL Studio for a good one). Many have felt the same way, but they don't fix it and there's nothing you can do about it because it's closed source.

Of course building a DAW is extremely difficult. Doesn't mean that because a closed source version already exists that it's not an endeavor worth pursuing. I personally would love for there to be a quality open source DAW comparable to the commercial ones (if I were wealthy it'd definitely be something I'd bankroll).


Ableton isn't open source. There are also several other DAWs and Ableton isn't some pinnacle of software engineering. Take Larry and ESR and anyone else with some pinches of salt, because accepting some proprietary implementations of DAWs as a tombstone on the FOSS development of DAWs is rather reductionist to the point of being totally wrong.

Why would you bother posting such a comment?

Your notion that all questions regarding how to do the manifold tasks a DAW are resolved is specious, and you couldn't possibly prove it to be true because it's false. Ableton has managed to make a product that solves the DAW questions in one particular way, and it's a black-box to boot, so that's not in the public domain, as Meadowlark is.

The technical questions entailed in Rust trying to mate with GUIs are EXTREMELY UNSOLVED AND RELEVANT RIGHT NOW! Let me tell you, state, state, state.

I can't even propose that you work for Ableton as they are classier than this! Their product is a good and useful DAW for many people, you bet. Your point is?


Agree, but ironically I find your first sentence to be auto-descriptive / equally against guidelines.


What's the solved problem you are refering to?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: