I suspect this article may even be underestimating the impact of WebGPU. I'll make two observations.
First, for AI and machine learning type workloads, the infrastructure situation is a big mess right now unless you buy into the Nvidia / CUDA ecosystem. If you're a research, you pretty much have to, but increasingly people will just want to run models that have already been trained. Fairly soon, WebGPU will be an alternative that more or less Just Works, although I do expect things to be rough in the early days. There's also a performance gap, but I can see it closing.
Second, for compute shaders in general (potentially accelerating a large variety of tasks), the barrier to entry falls dramatically. That's especially true on web deployments, where running your own compute shader costs somewhere around 100 lines of code. But it becomes practical on native too, especially Rust where you can just pull in a wgpu dependency.
As for text being one of the missing pieces, I'm hoping Vello and supporting infrastructure will become one of the things people routinely reach for. That'll get you not just text but nice 2D vector graphics with fills, strokes, gradients, blend modes, and so on. It's not production-ready yet, but I'm excited about the roadmap.
[Note: very lightly adapted from a comment at cohost; one interesting response was by Tom Forsyth, suggesting I look into SYCL]
WebGPU has no equivalent to tensor cores to my understanding; are there plans to add something like this? Or would this be "implementation sees matmul-like code; replaces with tensor core instruction". For optimal performance, my understanding is that you need tight control of e.g. shared memory as well -- is that possible with WebGPU?
On NVIDIA GPUs, flops without tensor cores are ~1/10th flops with tensor cores, so this is a pretty big deal for inference and definitely for training.
Shared memory, yes, with the goodies: atomics and barriers. We rely on that heavily in Vello, so we've pushed very hard on it. For example, WebGPU introduces the "workgroupUniformLoad" built-in, which lets you broadcast a value to all threads in the workgroup while not introducing potential unsafety.
Tensor cores: I can't say there are plans to add it, but it's certainly something I would like to see. You need subgroups in place first, and there's been quite a bit of discussion[1] on that as a likely extension post-1.0.
I haven't tried it myself, but it looks like several are already looking at implementing machine learning with WebGPU, and that this is one of the goals of WebGPU. Some info I found:
This is the discussion I hoped to find when clicking on the comments.
> Fairly soon, WebGPU will be an alternative...
So while the blog focused on the graphical utility of WebGPU, the underlying implementation of WebGPU is currently about the way that websites/apps can now interface with the GPU in a more direct/advantageous way to render graphics.
But what you're suggesting is that in the future new functionality will likely be added to take advantage of your GPU in other ways, such as training ML models and then using them via an inference engine all powered by your local GPU?
Is the reason you can't accomplish that today bc APIs haven't been created or opened up to allow such workloads? Are there not lower level APIs available/exposed today in WebGPU that would allow developers to begin the design of browser based ML frameworks/libraries?
Was it possible to interact with the GPU before WebGPU via Web Assembly?
Other than ML and graphics/games (and someone is probably going to mention crypto), are there any other potentially novel uses for WebGPU?
> ...in the future new functionality will likely be added to take advantage of your GPU in other ways, such as training ML models and then using them via an inference engine all powered by your local GPU?
Yes.
> Is the reason you can't accomplish that today bc APIs haven't been created or opened up to allow such workloads? Are there not lower level APIs available/exposed today in WebGPU that would allow developers to begin the design of browser based ML frameworks/libraries?
That is correct, there is no way before WebGPU to access compute capability of GPU hardware through the Web. There have been some hacks based on WebGL, but those are seriously limited. The fragmentation of the existing API space is a major reason we haven't seen as much progress on this.
> Was it possible to interact with the GPU before WebGPU via Web Assembly?
Only in limited ways through WebGL - no access to workgroup shared memory, ability to do random access writes to storage buffers, etc.
> Other than ML and graphics/games (and someone is probably going to mention crypto), are there any other potentially novel uses for WebGPU?
Yes! There is research on doing parallel compilers on GPU (Aaron Hsu's co-dfns as well as Voetter's work[1]). There's quite a bit of work on implementing Fourier transforms at extremely high throughput. Obviously, physics simulations and other scientific workloads are a good fit. To me, it feels like things are wide open.
> Was it possible to interact with the GPU before WebGPU via Web Assembly?
Only with tons of restrictions by going through WebGL, or by writing your own or extending an existing WASM runtime outside the browser which connects to system-native GPU APIs like D3D12, Vulkan or Metal.
> are there any other potentially novel uses for WebGPU
In general, the only thing that WebGPU has over calling into D3D12, Vulkan or Metal directly is that it provides a common subset of those 3 APIs (so you don't need to write 2 or 3 implementations to cover all popular operating systems) and it's easier to use (at least compared to D3D12 and Vulkan).
WebGPU is an important step into the right direction, but not a 'game changer' per se (at least outside the browser).
Could someone explain what kinds of useful things will become possible with it?
I don't get it yet, but HN seems excited about it, so I'd like to understand it.
What I get so far - running models that can be run on consumer sized GPUs will become easier, because users won't need to download a desktop app to do so. This is limited for now by the lack of useful models that can be run on consumer GPUs, but we'll get smaller models in the future. And it'll be easier to make visualizations, games, and VR apps in the browser. Is that right and what other popular use cases where people currently have to resort to WebGL or building a desktop app will get easier that I'm missing?
It's not so much that people want to run the models in the browser. They want to be able to write and publish one desktop app that e.g. runs a LLaMA quality model and runs decently across the hundreds of different GPUs that exist on their users' machines.
If this is something users actually care about, wouldn't Vulkan AI be a big deal? It has way more features for AI than WebGPU will likely ever have, and it's widely available today.
It spends a little time on the topic, and imho doesn't argue the point very well.
It points out that MoltenVK doesn't work very well, and I agree, although I think it's questionable whether the Molten-compatible subset is worse than WebGPU or not.
It says that Vulkan has driver issues on Linuxes, but provides no citations for that, and in my experiences, Vulkan drivers and developer tools have been the best for Linux, overall (my experience is limited to Nvidia, so maybe that's why, Idk).
This fixes the main problem with Vulkan, which is that there were no big tech companies pushing them. WebGPU has Apple, Google and Microsoft all committing to support it in their browser/OS.
Not quite. The main problem with Vulkan is, as the blog post goes into, one of usability. Vulkan isn't designed for end developers to use, it's designed for middleware vendors to use. Vulkan is already being pushed by many major big tech companies include Google, Nvidia, AMD, etc... It's really only Apple that's a problem here.
WebGPU is basically a middleware for those that just want to use graphics, not all of unity, unreal, etc...
The Vulkan situation on Apple is incredibly funny, especially once you look at the landscape of gaming. There's games out there (I believe Final Fantasy XIV is like this) where the official Mac client runs in wine, using a DirectX -> Vulkan translation layer, on top of a Vulkan -> Metal translation layer.
I'm not clear this is accurate. More, I'm not sure these are the right companies to push?
Many companies have tried to unseat Nvidia. Many of them big companies. There is no magic in the ones you named. Worse, I'm not clear that Google will bring stability to the project. Such that I see little reason to be overly excited.
Coming at it from a graphics processing perspective, working on a lot of video editing, it's annoying that just as GPUs start to become affordable as people turn their back on cryptobro idiocy and stop chasing the Dunning-Krugerrand, they've started to get expensive again because people want hardware-accelerated Eliza chatbots.
Anyway your choices for GPU computing are OpenCL and CUDA.
If you write your project in CUDA, you'll wish you'd used OpenCL.
If you write your project in OpenCL, you'll wish you'd used CUDA.
I've never once regretted my decision to write GPU code in CUDA... I mean, I wish there were alternatives because being locked into Nvidia isn't fun, but CUDA is a great developer experience.
The thought also come to mind, but after listening the work of Neural Magic At Practical AI [1], and how the work with model quantization over CPU is advancing by leaps and bounds, I don't foresee the strong dependence we have on CUDA persisting, even in the near future.
First, for AI and machine learning type workloads, the infrastructure situation is a big mess right now unless you buy into the Nvidia / CUDA ecosystem. If you're a research, you pretty much have to, but increasingly people will just want to run models that have already been trained. Fairly soon, WebGPU will be an alternative that more or less Just Works, although I do expect things to be rough in the early days. There's also a performance gap, but I can see it closing.
Second, for compute shaders in general (potentially accelerating a large variety of tasks), the barrier to entry falls dramatically. That's especially true on web deployments, where running your own compute shader costs somewhere around 100 lines of code. But it becomes practical on native too, especially Rust where you can just pull in a wgpu dependency.
As for text being one of the missing pieces, I'm hoping Vello and supporting infrastructure will become one of the things people routinely reach for. That'll get you not just text but nice 2D vector graphics with fills, strokes, gradients, blend modes, and so on. It's not production-ready yet, but I'm excited about the roadmap.
[Note: very lightly adapted from a comment at cohost; one interesting response was by Tom Forsyth, suggesting I look into SYCL]