yeah the section on how the GPU support works is wild!

amelius · on Nov 29, 2023

Why don't package managers do stuff like this?

thelastparadise · on Nov 29, 2023

So if you share a binary with a friend you'd have to have them install cuda toolkit too?

Seems like a dealbreaker for the whole idea.

brucethemoose2 · on Nov 29, 2023

> On Windows, that usually means you need to open up the MSVC x64 native command prompt and run llamafile there, for the first invocation, so it can build a DLL with native GPU support. After that, $CUDA_PATH/bin still usually needs to be on the $PATH so the GGML DLL can find its other CUDA dependencies.

Yeah, I think the setup lost most users there.

A separate model/app approach (like Koboldcpp) seems way easier TBH.

Also, GPU support is assumed to be CUDA or Metal.

jart · on Nov 29, 2023

Author here. llamafile will work on stock Windows installs using CPU inference. No CUDA or MSVC or DLLs are required! The dev tools are only required to be installed, right now, if you want get faster GPU performance.

vsnf · on Nov 30, 2023

My attempt to run it with the my VS 2022 dev console and a newly downloaded CUDA installation ended in flames as the compilation stopped with "error limit reached", followed by it defaulting to a CPU run.

It does run on the CPU though, so at least that's pretty cool.

jart · on Nov 30, 2023

I've received a lot of good advice today on how we can potentially improve our Nvidia story so that nvcc doesn't need to be installed. With a little bit of luck, you'll have releases soon that get your GPU support working.

abareplace · on Nov 30, 2023

The CPU usage is around 30% when idle (not handling any HTTP requests) under Windows, so you won't want to keep this app running in background. Otherwise, it's a nice try.

fragmede · on Nov 29, 2023

I'm sure doing better by windows users is on the roadmap, exec then reexec to get into the right runtime, but it's a good first step towards making things easy.