> On Windows, that usually means you need to open up the MSVC x64 native command prompt and run llamafile there, for the first invocation, so it can build a DLL with native GPU support. After that, $CUDA_PATH/bin still usually needs to be on the $PATH so the GGML DLL can find its other CUDA dependencies.
Yeah, I think the setup lost most users there.
A separate model/app approach (like Koboldcpp) seems way easier TBH.
Author here. llamafile will work on stock Windows installs using CPU inference. No CUDA or MSVC or DLLs are required! The dev tools are only required to be installed, right now, if you want get faster GPU performance.
My attempt to run it with the my VS 2022 dev console and a newly downloaded CUDA installation ended in flames as the compilation stopped with "error limit reached", followed by it defaulting to a CPU run.
It does run on the CPU though, so at least that's pretty cool.
I've received a lot of good advice today on how we can potentially improve our Nvidia story so that nvcc doesn't need to be installed. With a little bit of luck, you'll have releases soon that get your GPU support working.
The CPU usage is around 30% when idle (not handling any HTTP requests) under Windows, so you won't want to keep this app running in background. Otherwise, it's a nice try.
I'm sure doing better by windows users is on the roadmap, exec then reexec to get into the right runtime, but it's a good first step towards making things easy.