It's also interesting that this opens up the full saturation of Apple Silicon (minus the ANE): GGML can run on the CPU, using NEON and AMX, while another instance could run via Metal on the GPU using MLC/dawn. Though the two couldn't share (the same) memory at the moment.
The GPU's ML task energy is so much lower that you'd probably get better performance running everything on the GPU.
I think some repos have tried splitting things up between the NPU and GPU as well, but they didn't get good performance out of that combination? Not sure why, as the NPU is very low power.
Technically any newish laptop with 64GB of RAM has 64GB of "VRAM," but right now the Apple M series and AMD 7000 series are the only IGPs with any significant ML power.
I’m not sure what you mean. Typically, an iGPU slices off part of RAM for the GPU at boot time, which means it’s fixed and not shared. When did this change?
For Intel, it seems that per their chart under "What is the maximum amount of graphics memory or video memory my computer can use?" and discussion under "Will adding more physical memory increase my graphics memory amount?" at https://www.intel.com/content/www/us/en/support/articles/000..., iGPUs included with 5th gen/Broadwell processors were their first to do so in 2014.
Full unified memory came 10 ish years ago (also powering the PS4) but I think hw ability to adjust iGPU memory without booting predated that, Intel seems to have called it DVMT.