Hacker News new | past | comments | ask | show | jobs | submit login

"Fine-tuning Llama 2 70B on three iterations of our approach yields a model that outperforms many existing systems on the AlpacaEval 2.0 leaderboard, including Claude 2, Gemini Pro, and GPT-4 0613."

Cool and impressive. I'm curious if this training method will become more common.




A new 7B model, Snorkel-Mistral-PairRM-DPO, using a similar self-rewarding pipeline was just released:

* Announcement: https://twitter.com/billyuchenlin/status/1749975138307825933

* Model Card: https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO

* Response Re-Ranker: https://huggingface.co/llm-blender/PairRM

"We would also like to acknowledge contemporary work published independently on arXiv on 2024-01-18 by Meta & NYU (Yuan, et al) in a paper called Self-Rewarding Language Models, which proposes a similar general approach for creating alignment pairs from a larger set of candidate responses, but using the LLM as the reward model. While this may work for general-purpose models, our experience has shown that task-specific reward models guided by SMEs are necessary for most enterprise applications of LLMs for specific use cases, which is why we focus on the use of external reward models."


>Snorkel-Mistral-PairRM-DPO

The naming of these models is getting ridiculous...


I kind of disagree. It's not "user friendly" but it is very descriptive. They are codenames afterall. Take "dolphin-2.6-mistral-7b-dpo-laser" for instance : with a little LLM background knowledge, just from the name you know it is a 7 billion parameters model based on Mistral, with a filtered dataset to remove alignment and bias (dolphin), version 2.6 and using the techniques described in the Direct Preference Optimization (https://arxiv.org/pdf/2305.18290.pdf) and Laser (https://arxiv.org/pdf/2312.13558.pdf) papers to improve its output.


And here I was thinking they were somehow using the first three words from my Bitcoin wallet.


Thank you for a great and informative explanation despite my somewhat ignorant take.

I'm an occasional visitor to huggingface, so I'm actually superficially familiar with the taxonomy. I just felt like, even if I tried to satirize it, I wouldn't be able to come up with a crazier name. And that's not even the end of the Cambrian explosion of LLMs.


A bit like the User-Agent string.


Thanks for this, and the link you provided below for GGUF files! I just cleared my schedule this afternoon to kick the tires.


I assume this doesn’t yet run on llama.cpp?



It is based on Mistral which llama.cpp supports, so I assume it does run (you might need to convert to GGUF format and quantize it).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: