Hacker News new | past | comments | ask | show | jobs | submit login

It’s hard to keep up with all developments around LLaMA. What’s the best RLHF alpaca like model you can download right now?



Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality by the Team with members from UC Berkeley, CMU, Stanford, and UC San Diego

https://vicuna.lmsys.org/


I have got Vicuna-13B working on GTX 3090 Ti + OpenCL + CPU with 90% of weights on the GPU (otherwise running out of memory) at around 500ms per token.

This model is really good for a (semi-)open source model. I think this may be the first locally runnable model that I will actually use for real stuff rather than just play around for fun.

It's not ChatGPT level but it's not that far behind. It will draw ASCII art HUDs for a text adventure or analyze data or recognize languages or write stories. AFAIK it's been trained on ChatGPT discussions so makes sense.

This AI still gets uppity sometimes about offensive content but unlike ChatGPT, you can edit the prompts to put words in its mouth to encourage it to answer properly.


How did you distribute the weights between CPU and GPU? Thanks


See my response on the sibling comment; I implemented it in a custom Rust implementation.


Mind sharing how you got it to work in your setup?


I work on an independent LLM implementation here: https://github.com/Noeda/rllama/

I only got it working at all yesterday and there's no nice UX at all. Not sure I recommend trying to use this as llama.cpp will probably have this in no time with a much better user experience, although I am also trying to make it more usable.

If you follow the instructions on Vicuna page over how to apply the deltas, and you can compile the project, then you could run:

cargo run --release --features opencl -- --model-path /models/vicuna13b --param-path /models/vicuna13b/config.json --tokenizer-path /models/vicuna13b/tokenizer.model --prompt-file prompt --top-p 1.0 --top-k 20 --repetition-penalty 1 --temperature 0.9 --max-seq-len 2048 --f16 --percentage-to-gpu 0.9

Where /models/vicuna13b is the HuggingFace-compatible model. This will put 90% of weights on GPU and remaining 10% non CPU which is just barely enough to not run out of GPU memory (on a 24 gig card)

Create a text file 'prompt' with the prompt. I've been using this template:

You are a helpful and precise assistant for checking the quality of the answer.###Human: Can you explain nuclear power to me?###Assistant:

(the model seems to use ### as delimiters to distinguish Human and Assistant). The "system prompt" is whatever text is written at the beginning.


The feature to load a % to the GPU is novel and amazing! I couldn't get the project up and running myself (requires a nightly rust build) but I love this particular innovation.


Related:

Vicuna: An open-source chatbot impressing GPT-4 with 90% ChatGPT quality - https://news.ycombinator.com/item?id=35378683 - March 2023 (167 comments)


Are the Vicuna weights available for download, and are they llama.cpp compatible? I can't grok that by skimming the page...


The github page (https://github.com/lm-sys/FastChat#vicuna-weights) is better:

> We release Vicuna weights as delta weights to comply with the LLaMA model license. You can add our delta to the original LLaMA weights to obtain the Vicuna weights.


Officially, only as deltas against LLaMa weights, and needing a complicated and resource-intensive conversion procedure. Unofficially, yes, a pre-converted llama.cpp compatible ggml file is available, but obviously I won't publish the link here to avoid violating the Y Combinator's terms of use.


Vicuna's fine-tune of the LLaMA weights are available for download, called "deltas".

So you get the LLaMA weights (somewhere), then apply the Vicuna deltas to them to end up with the Vicuna model.


The weight deltas are available: https://github.com/lm-sys/FastChat#vicuna-weights


I recently found this list of models that works with llama.cpp: https://rentry.org/nur779 (with dl links, albeit given llama's licensing gray area, use at your own risk)

The latest so far would be Vicuna, whose weights were just recently release.


Vicune




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: