llama.cpp typically needs to be compiled separately for each operating system and architecture (Windows, macOS, Linux, etc.), which is less portable.
Also, the article mentions the use of hardware acceleration on devices with heterogeneous hardware accelerators. This implies that the Wasm-compiled program can efficiently utilize different hardware resources (like GPUs and specialized AI chips) across various devices. A direct C++ implementation might require specific optimizations or versions for each type of hardware to achieve similar performance.
> Wasm-compiled program can efficiently utilize different hardware resources (like GPUs and specialized AI chips) across various devices
I do not buy it, but maybe I am ignorant of progress being made there.
> A direct C++ implementation might require specific optimizations or versions for each type of hardware to achieve similar performance.
Because I do not buy previous one, I do not buy that similar performance can be painlessly(without extra developer time) achieved there, and that wasm runtime capable to achieve it.
So the magic (or sleight of hand, if you prefer) seems to be in
> You just need to install the WasmEdge with the GGML plugin.
And it turns out that all these plugins are native & specific to the acceleration environment as well. But this has to happen after it lands in its environment so your "portable" application is now only portable in the sense that once it starts running it will bootstrap itself by downloading and installing native platform-specific code from the internet. Whether that is a reasonable thing for an "edge" application to do I am not sure.
Where have I seen this WORA before, including for C and C++?
WASM does not provide access to hardware acceleration on devices with heterogeneous hardware accelerators, even its SIMD bytecodes are a subset of what most CPUs are capable of.
Also, the article mentions the use of hardware acceleration on devices with heterogeneous hardware accelerators. This implies that the Wasm-compiled program can efficiently utilize different hardware resources (like GPUs and specialized AI chips) across various devices. A direct C++ implementation might require specific optimizations or versions for each type of hardware to achieve similar performance.