"Fine-tuning Llama 2 70B on three iterations of our approach yields a model that...

lhl · 2024-01-25T03:54:39 1706154879

A new 7B model, Snorkel-Mistral-PairRM-DPO, using a similar self-rewarding pipeline was just released:

* Announcement: https://twitter.com/billyuchenlin/status/1749975138307825933

* Model Card: https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO

* Response Re-Ranker: https://huggingface.co/llm-blender/PairRM

"We would also like to acknowledge contemporary work published independently on arXiv on 2024-01-18 by Meta & NYU (Yuan, et al) in a paper called Self-Rewarding Language Models, which proposes a similar general approach for creating alignment pairs from a larger set of candidate responses, but using the LLM as the reward model. While this may work for general-purpose models, our experience has shown that task-specific reward models guided by SMEs are necessary for most enterprise applications of LLMs for specific use cases, which is why we focus on the use of external reward models."

spidersenses · 2024-01-25T14:32:39 1706193159

>Snorkel-Mistral-PairRM-DPO

The naming of these models is getting ridiculous...

column · 2024-01-25T15:31:53 1706196713

I kind of disagree. It's not "user friendly" but it is very descriptive. They are codenames afterall. Take "dolphin-2.6-mistral-7b-dpo-laser" for instance : with a little LLM background knowledge, just from the name you know it is a 7 billion parameters model based on Mistral, with a filtered dataset to remove alignment and bias (dolphin), version 2.6 and using the techniques described in the Direct Preference Optimization (https://arxiv.org/pdf/2305.18290.pdf) and Laser (https://arxiv.org/pdf/2312.13558.pdf) papers to improve its output.

salad-tycoon · 2024-01-25T16:35:13 1706200513

And here I was thinking they were somehow using the first three words from my Bitcoin wallet.

spidersenses · 2024-01-25T15:55:06 1706198106

Thank you for a great and informative explanation despite my somewhat ignorant take.

I'm an occasional visitor to huggingface, so I'm actually superficially familiar with the taxonomy. I just felt like, even if I tried to satirize it, I wouldn't be able to come up with a crazier name. And that's not even the end of the Cambrian explosion of LLMs.

Loic · 2024-01-25T14:50:43 1706194243

A bit like the User-Agent string.

mark_l_watson · 2024-01-25T11:14:50 1706181290

Thanks for this, and the link you provided below for GGUF files! I just cleared my schedule this afternoon to kick the tires.

azinman2 · 2024-01-25T07:40:50 1706168450

I assume this doesn’t yet run on llama.cpp?

lhl · 2024-01-25T09:17:23 1706174243

here are some GGUFs https://huggingface.co/brittlewis12/Snorkel-Mistral-PairRM-D...

tarruda · 2024-01-25T08:54:21 1706172861

It is based on Mistral which llama.cpp supports, so I assume it does run (you might need to convert to GGUF format and quantize it).