It’s hard to keep up with all developments around LLaMA. What’s the best RLHF al...

menzoic · on April 5, 2023

Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality by the Team with members from UC Berkeley, CMU, Stanford, and UC San Diego

https://vicuna.lmsys.org/

adeon · on April 5, 2023

I have got Vicuna-13B working on GTX 3090 Ti + OpenCL + CPU with 90% of weights on the GPU (otherwise running out of memory) at around 500ms per token.

This model is really good for a (semi-)open source model. I think this may be the first locally runnable model that I will actually use for real stuff rather than just play around for fun.

It's not ChatGPT level but it's not that far behind. It will draw ASCII art HUDs for a text adventure or analyze data or recognize languages or write stories. AFAIK it's been trained on ChatGPT discussions so makes sense.

This AI still gets uppity sometimes about offensive content but unlike ChatGPT, you can edit the prompts to put words in its mouth to encourage it to answer properly.

wensheng · on April 5, 2023

How did you distribute the weights between CPU and GPU? Thanks

adeon · on April 5, 2023

See my response on the sibling comment; I implemented it in a custom Rust implementation.

VadimPR · on April 5, 2023

Mind sharing how you got it to work in your setup?

adeon · on April 5, 2023

I work on an independent LLM implementation here: https://github.com/Noeda/rllama/

I only got it working at all yesterday and there's no nice UX at all. Not sure I recommend trying to use this as llama.cpp will probably have this in no time with a much better user experience, although I am also trying to make it more usable.

If you follow the instructions on Vicuna page over how to apply the deltas, and you can compile the project, then you could run:

cargo run --release --features opencl -- --model-path /models/vicuna13b --param-path /models/vicuna13b/config.json --tokenizer-path /models/vicuna13b/tokenizer.model --prompt-file prompt --top-p 1.0 --top-k 20 --repetition-penalty 1 --temperature 0.9 --max-seq-len 2048 --f16 --percentage-to-gpu 0.9

Where /models/vicuna13b is the HuggingFace-compatible model. This will put 90% of weights on GPU and remaining 10% non CPU which is just barely enough to not run out of GPU memory (on a 24 gig card)

Create a text file 'prompt' with the prompt. I've been using this template:

You are a helpful and precise assistant for checking the quality of the answer.###Human: Can you explain nuclear power to me?###Assistant:

(the model seems to use ### as delimiters to distinguish Human and Assistant). The "system prompt" is whatever text is written at the beginning.

VadimPR · on April 8, 2023

The feature to load a % to the GPU is novel and amazing! I couldn't get the project up and running myself (requires a nightly rust build) but I love this particular innovation.

dang · on April 5, 2023

isoprophlex · on April 5, 2023

Are the Vicuna weights available for download, and are they llama.cpp compatible? I can't grok that by skimming the page...

TehCorwiz · on April 5, 2023

The github page (https://github.com/lm-sys/FastChat#vicuna-weights) is better:

> We release Vicuna weights as delta weights to comply with the LLaMA model license. You can add our delta to the original LLaMA weights to obtain the Vicuna weights.

patrakov · on April 5, 2023

Officially, only as deltas against LLaMa weights, and needing a complicated and resource-intensive conversion procedure. Unofficially, yes, a pre-converted llama.cpp compatible ggml file is available, but obviously I won't publish the link here to avoid violating the Y Combinator's terms of use.

detrites · on April 5, 2023

Vicuna's fine-tune of the LLaMA weights are available for download, called "deltas".

So you get the LLaMA weights (somewhere), then apply the Vicuna deltas to them to end up with the Vicuna model.

danielbln · on April 5, 2023

The weight deltas are available: https://github.com/lm-sys/FastChat#vicuna-weights

a5huynh · on April 5, 2023

I recently found this list of models that works with llama.cpp: https://rentry.org/nur779 (with dl links, albeit given llama's licensing gray area, use at your own risk)

The latest so far would be Vicuna, whose weights were just recently release.

tantony · on April 5, 2023

Vicune