> For example, the latest MacBook Pro can have more than 60G+ unified GPU RAM that can be used to store the model weights and a reasonably powerful GPU to run many workloads.
...for $3.5K minimum, according to the Apple website :/
Is there any chance WebGPU could utilize the matrix instructions shipping on newer/future IGPs? I think MLIR can do this through Vulkan, which is how SHARK is so fast in Stable Diffusion on the AMD 7900 series, but I know nothing about webgpu's restrictions or Apache TVM.
Dawn and WebIDL is also an easy way to add GPU support to any application (that can link C code (or use via a lib)). And Google maintains the compiler layer for the GPU frameworks (Metal, DX, Vulkan ...). This is going to be a great leap forward for GPGPU for many apps.
It's also interesting that this opens up the full saturation of Apple Silicon (minus the ANE): GGML can run on the CPU, using NEON and AMX, while another instance could run via Metal on the GPU using MLC/dawn. Though the two couldn't share (the same) memory at the moment.
The GPU's ML task energy is so much lower that you'd probably get better performance running everything on the GPU.
I think some repos have tried splitting things up between the NPU and GPU as well, but they didn't get good performance out of that combination? Not sure why, as the NPU is very low power.
Technically any newish laptop with 64GB of RAM has 64GB of "VRAM," but right now the Apple M series and AMD 7000 series are the only IGPs with any significant ML power.
I’m not sure what you mean. Typically, an iGPU slices off part of RAM for the GPU at boot time, which means it’s fixed and not shared. When did this change?
For Intel, it seems that per their chart under "What is the maximum amount of graphics memory or video memory my computer can use?" and discussion under "Will adding more physical memory increase my graphics memory amount?" at https://www.intel.com/content/www/us/en/support/articles/000..., iGPUs included with 5th gen/Broadwell processors were their first to do so in 2014.
Full unified memory came 10 ish years ago (also powering the PS4) but I think hw ability to adjust iGPU memory without booting predated that, Intel seems to have called it DVMT.
...for $3.5K minimum, according to the Apple website :/
Is there any chance WebGPU could utilize the matrix instructions shipping on newer/future IGPs? I think MLIR can do this through Vulkan, which is how SHARK is so fast in Stable Diffusion on the AMD 7900 series, but I know nothing about webgpu's restrictions or Apache TVM.