Its already kind of outdated, lol. The backends de joure are either llama.cpp fr...

Its already kind of outdated, lol.

The backends de joure are either llama.cpp frontends (I use Kobold.cpp at the moment) or oobabooga as the guide specifies, but with the exLlamav2 backend.

If you are serving a bunch of people, run a vLLM backend instead since it supports batching, and host it on the Horde if you are feeling super nice: https://lite.koboldai.net/#

Technically only vLLM will work with this new model at the moment, but I'm sure cpp/ooba support will be added within days.

This comment will probably be obsolete within a month, when llama.cpp gets batching, MLC gets a better frontend, or some other breakthrough happens :P