Mistral-8x7B-Chat

lannisterstark · on Dec 10, 2023

Man this LLM stuff gets released faster than I can keep up.

Is there a centralized list somewhere that tests "use this for x purpose, use that for y?"

brucethemoose2 · on Dec 10, 2023

Even the OSS trainers/finetuners themselves don't really know the use case for their models. They just hot drop them and move on, lol.

htrp · on Dec 11, 2023

They train and drop them.... and rely on you to figure it out.

gregsadetsky · on Dec 10, 2023

This isn't exactly what you're looking for, but I've found this ranked list of llms useful -- https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

Something more domain/use-specific would be great to have

stavros · on Dec 11, 2023

Keep in mind that the rankings are just that: an order of "best" to "worst". There's nothing on there that's an indication of how much better one model is from another.

WhitneyLand · on Dec 11, 2023

That’s exactly what it does.

It’s an Elo scoring system, like chess. By definition the difference between two entries represents how often it beats the other entry.

https://en.wikipedia.org/wiki/Elo_rating_system

stavros · on Dec 11, 2023

So, GPT-4 beats the next best LLM. Is GPT-4 twice as good, or a hundred times as good as the next one?

furyofantares · on Dec 11, 2023

Looks like 62 points better, which under the Elo model implies about 59/41 matchup (aka GPT-4-Turbo is about 59% chance to beat Claude on a random prompt). You can look up an Elo probability calculator to plug in an Elo difference and see this number. For example plug in two Elo ratings here: https://sandhoefner.github.io/chess.html - really only the difference between them matters.

Btw the similarity of the numbers 59 and 62 here is coincidence, other differences won't be nearly the same as the probability.

lxe · on Dec 10, 2023

This is a finetune of the torrent-dropped moe model from a few days ago. Not a "release" per se, but rather a display of community excitement over this.

adt · on Dec 10, 2023

Best I can do, tree fiddy:

https://lifearchitect.ai/models-table/

appplication · on Dec 11, 2023

Honestly none of them are particularly incredible compared to neither. At this point everything is a prototype, there’s no sense in building knowledge specific to them unless you’re also building LLM products.

It’s just distributed ADHD at this point. LLMs are new and cool and each new release will be significantly better than the last, but as with any emerging tech, we’re on an exponential curve so there’s no sense in falling in love with specifics or products until things stabilize.

lannisterstark · on Dec 11, 2023

>It’s just distributed ADHD at this point

I just want a FOSS 'ADHD' model that I can ask questions to and it gives me mostly accurate answers as well as help me debug shit lol.

behnamoh · on Dec 10, 2023

> Is there a centralized list somewhere that tests "use this for x purpose, use that for y?"

Yeah, "don't use these models for production, use OpenAI for production, ignore Claude/Gemini/etc.".

PostOnce · on Dec 11, 2023

"Put all your eggs in one basket" isn't the traditional advice.

Any one of these companies can at any time change their API, pricing, access rules, or even swap the model out for a dumber, cheaper one at the same price. You'll have no recourse if you don't have several backends available, or control your own.

At a minimum, you should have available several hot-swappable backends/APIs if you want to remain viable in an indeterminate future.

Der_Einzige · on Dec 11, 2023

I want someone to offer a large sum of money to anyone who can leak the full GPT-4 model to prevent ClosedAIs lunacy. We already got a model leak from NAI and it forever propelled AI waifus. Why can’t some Good Samaritan take one for the team and give the world access to the closest thing we have to AGI? Whoever does it will be remembered even more positively than Aaron Swartz

Why should I respect copyright on a model when the model training didn’t respect copyright? To be clear, I’m a copyright abolitionist so I’m down with not accepting copyright. I want someone to force OpenAI to be actually open.

holoduke · on Dec 10, 2023

Why not. For simple things like converting, translating its quite handy and the costs are with you. I use open models for removing backgrounds from images. Also use models to translate texts. Openai would be too expensive for my usecases.

lannisterstark · on Dec 11, 2023

None of this is about production. I'm not building anything with LLMs. (well, apart from a couple of IRC bots).

Most of the reason I'm asking is to use them for personal usage. Not a huge fan of closed models in general.

lab · on Dec 10, 2023

We just made it available on OpenRouter:

https://openrouter.ai/models/fireworks/mixtral-8x7b-fw-chat

Chat playground:

https://openrouter.ai/playground?models=fireworks/mixtral-8x...

behnamoh · on Dec 10, 2023

Man, I liked OpenRouter but most models run so much slower on OR that I had to switch to other solutions. Is there a reason for horribly slow Meta Llama 2 models?

behnamoh · on Dec 10, 2023

Somewhere between shiny Google releases and Mistral's magnet link tweet, there's gotta be a sweet spot where you release the model but also have enough decency to tell people how to use it optimally. Mistral, if you're reading this, I'm talking about you.

antirez · on Dec 10, 2023

llama.cpp support needed ASAP, not just quantized, but I hope it will be albe to run in a way that keeps the same two experts for minimum N tokens generated and re-valuate the expert to select only then, so that it will be possible to do the inference in low ram doing something similar to "paging".

kristianp · on Dec 10, 2023

There is an issue for it:

https://github.com/ggerganov/llama.cpp/issues/4381

kristianp · on Dec 11, 2023

I'm looking forward to a 4-bit quantized version that I can run on my Intel Thinkpad with 32GB RAM, should be almost usable if its as fast as a 14B size model.

Whether it's any good is another matter, I guess the leaderboard on HuggingFace will be updated at some point.

rgbrgb · on Dec 10, 2023

what a sick project to be able to attract a billionaire programmer [0] and c royalty.

[0]: https://github.com/ggerganov/llama.cpp/issues/4216#issuecomm...

jaipilot747 · on Dec 10, 2023

For those out of the loop, who are the billionaire programmer and C royalty people in this link?

OJFord · on Dec 11, 2023

The linked comment is authored by the CEO of Shopify; the top-level comment in this thread is the .. what would you say, author/originator/founder of Redis. (And maybe/presumably more widely known in C-land for other projects or contributions, idk.)

KennyBlanken · on Dec 10, 2023

I also looked at the threads and had no idea who they were referencing.

antirez · on Dec 10, 2023

Tobi is the founder of Shopify.

kostadin · on Dec 11, 2023

Tobi Shopify and antirez (redis etc)

tarruda · on Dec 11, 2023

> so that it will be possible to do the inference in low ram doing something similar to "paging".

I was also hoping for that, but my initial research suggests that you need to fit everything into VRAM (or RAM when doing llama.cpp CPU inference).

The upside is that it should be as fast as running a single 7B model, which on llama.cpp should be in the ballpark of 10 tok/sec when 4-bit quantized.

For local usage, I'd say you will have an "OK" experience in a recent laptop with 64GB RAM, which is more realistic than having the necessary VRAM in a Nvidia GPU.

tarruda · on Dec 11, 2023

Correction, it should have the performance of a single 12B parameter model:

"Concretely, Mixtral has 45B total parameters but only uses 12B parameters per token. It, therefore, processes input and generates output at the same speed and for the same cost as a 12B model."

https://mistral.ai/news/mixtral-of-experts/

a_wild_dandan · on Dec 10, 2023

PSA: This is broken for Apple silicon. It relies on the CUDA-only `flash_attn` package. Alas, I wanted to try Mistral's MoE on my MacBook. :( Understandable though! It's a new model with an incredibly bizarre release. My phalanges are crossed.

king_magic · on Dec 11, 2023

I think "broken" is a bit of an exaggeration, when the actual problem is "just not compatible with your hardware".

Jimbabwe · on Dec 10, 2023

There’s probably a better place to ask this highly specific technical question, but I’m avoiding Reddit these days so just throwing it out I guess. I’ve been trying to run these in a container but it’s verrrry slow, I believe, because of the lack of gpu help. All the instructions I find are for nvidia gpus and my server is a qnap tvs-473e with an embedded amd cpu/gpu (I know, I know). The only good news is that I’ve upgraded the ram to 32gb, and I have a 1TB ssd. Any idea of how I can get my own self-hosted LLM/chat service on this funky hardware? The nvidia/docker option requires installing the nvidia runtime alongside docker, but I can’t find an amd equivalent. Thanks. Sorry for the wall of text nobody cares about.

brucethemoose2 · on Dec 10, 2023

> qnap tvs-473e

Specs say this runs an AMD RX-421BD. This is a 2015 AMD CPU with 2 bulldozer cores and a tiny IGP.

...To be blunt, you would be much better off running LLMs on your phone. Even an older phone. Or literally whatever device you are reading HN on. But if you insist, the runtime you want in MLC-LLM's Vulkan runtime.

refulgentis · on Dec 10, 2023

This. Sibling llama.cpp comment is standard "I know llama.cpp, I assume that's 80% of the universe instead of .8%, and I assume that's all anyone needs. So I know just enough to be dangerous with ppl looking for advice".

You'll see it over and over again when you're looking for help, be careful, it's 100% a blind alley in your case. It's very likely you'll be disappointed by MLC as well, simultaneously it's your only real option. You definitely won't hit 1 tkn/sec, and honestly, id bet 0.1 tkn / sec

Jimbabwe · on Dec 10, 2023

Thanks. Yeah I’ve noticed that llama.cpp is definitely the recommendation du jour, but this thread gave me hope: https://www.reddit.com/r/LocalLLaMA/comments/13p8zq2/update_...

Sorry for reddit link.

brucethemoose2 · on Dec 10, 2023

The OpenCL backend is not that great. Its kinda slow on GPUs, completely unusable on IGPs, and lags behind the CUDA, Metal and rOCM backends feature wise.

To be blunt, there isn't much interest in support outside of apple/nvidia. There is a WIP Vulkan backend, but (last I checked) progress is slow and its not optimzed for IGPs either.

MLC-LLM is much more promising once its features get fleshed out, as it "inherits" support for many devices from its Apache TVM backend.

brucethemoose2 · on Dec 11, 2023

Yeah I feel this way about llama.cpp too. Maybe not that exact wording, but its not the end all be all llama runtime like everyone seems to think it is.

Jimbabwe · on Dec 10, 2023

Thanks, I’ll look into it! Especially if the llama.cpp route is a dud, like the other response says it will be. My little qnap clunker handles all the self hosting stuff I throw at it, but I won’t be surprised if it simply has met its match

loxias · on Dec 10, 2023

> my server is a qnap tvs-473e with an embedded amd cpu/gpu

That's your problem. I googled and it looks like one of these all-in-one appliances like a drobo or whatever's popular these days. That's not a server. (At least, I wouldn't call it a server. It's an all-in-one appliance, or toy, depending on perspective) And yegods, that price...

Spend $500, get an actual computer, not some priced up appliance, and you'll have a much better time. Regardless of if you spend it on more CPU or more GPU. You can get a used computer off ebay for $100 and shove a $400 graphics card in it. Or maybe get a ryzen 7 7700x, I'm looking at a mobo+cpu combo with that for $500 right now.

Finally, to make sure this response does contain a answer to what you asked: ;-)

if you can run this stuff in a container on your appliance already, but it's very slow, congrats! I'd call that a win. I looked up the chip, the RX-421BD, it's of similar power as an Athelon circa 2017. I think my router might have more compute power. You _do_ have those 512 shader cores, given effort, you could try and get them to do something useful. But I wouldn't assume it's possible (well, maybe you don't mind writing your own shaders ;-)). Just because the chip has "some gpu" doesn't mean it has "the right kind of gpu you'd need to hijack for lots of matrix multiplies, without writing the assembly yourself".

Sorry this isn't more helpful, but it's the truth.

KeplerBoy · on Dec 10, 2023

Something to add: if your NAS-Server is advertised with it's dual 4K capable HDMI ports, you know you're looking at a consumer toy targeted at people who hoard (pirated) media and not for serious stuff.

loxias · on Dec 10, 2023

Hah!

I hadn't even noticed that, I just saw "I could build that for 1/4th the cost"+"wtf, only 4 drives?"

Stuff like this prompts a dual response in me. :)

It always gives me a strong urge to educate "it's not that hard, and it's fun" to build it yourself.

AND it always makes me kick myself for not commercializing the expandable media servers I started building in the early 2000s, for me, for the dorm, for my friends, i.e. exactly the people you identify. :)

Jimbabwe · on Dec 10, 2023

Meh I bought this 5 years ago because there was a sale on 10tb hard drives and I thought “Why shouldn’t I become a data hoarder?” And now it runs homeassistant and frigate and MeTube and jellyfin and if it doesn’t work for ollama then I’ll probably just deal with it, lol.

loxias · on Dec 10, 2023

Fair enough, fair enough. Also I'm a data hoarder too, have been since pre-reddit. internet high five. :) I've gotten a lot of mileage (at least 15 years) off of centralizing all my storage, and centralizing all my compute, but _not_ having them be the same box. YMMV, of course.

Unrelated, cool, I hadn't heard of any of those 4 programs, I'm googling now and some look useful. Thanks! Possibly saving me some time in my next project...

curious_cat_163 · on Dec 10, 2023

You'll want to try llama.cpp [1]. The set of models that it can support is expanding [2]. Folks have also written services [3] that wrap around it.

[1] https://github.com/ggerganov/llama.cpp [2] https://huggingface.co/TheBloke [3] https://github.com/abetlen/llama-cpp-python

Jimbabwe · on Dec 10, 2023

Thanks! I was just following the thread about their recent addition of the OpenCl support and was on the verge of trying it out last weekend. I’ll definitely continue once I’m home again!

atcalan · on Dec 11, 2023

It runs up to 13b models nicely on a 16g orange pi 5 fyi.

holoduke · on Dec 10, 2023

Save yourself some time and buy a 4090 if you really want to be high tier (consumer range) you will have a much faster experience. Not only with text. Also stable diffusion etc

brucethemoose2 · on Dec 11, 2023

Or better yet, a 3090 for very similar performance at half the price.

atcalan · on Dec 11, 2023

Why not both? Or two 3090s for that sweet, sweet nvlink for fine tuning?

vizzah · on Dec 10, 2023

This model is better by many other contenders, but still far from GPT4.

"what famous brands are there which change one letter from a common word to make a non-existent, but a catchy name, such as "musiq" instead of "music".. etc?"

There are several brands that have played with words by changing a letter or adding a letter to create a new and memorable name. Here are a few examples:

    Qatar Airways - This airline's name was inspired by the country of Qatar, but with the letter "Q" replaced by "Q" to create a unique and catchy name.

    Lush Cosmetics - This cosmetics company's name is a play on the word "lush," which means "profuse" or "luxuriant."

    Pinterest - This social media site's name is a combination of the words "pin" and "interest," but with the letter "e" replaced by "i" to create a more unique and memorable name.

    Tumblr - This social media platform's name is a play on the words "tumble" and "blogger," but with the letter "l" replaced by "r" to create a more unique and catchy name.

---------

that is compared to hilarious Yi 34B Chat below: =)

There are several famous brands that have used this strategy of changing one letter from a common word to create a unique and catchy name. Here are a few examples:

    eBay - Instead of "cBay," which would be a reference to "crab," eBay was named after the first three letters of the word "baY" and the "e" from the word "electronic," as in "electronic marketplace."



    Google - Originally named "BackRub," the company changed its name to Google, a play on the word "googol," which refers to the number one followed by one hundred zeros, expressing the vastness of the internet.



    Kodak - The name Kodak is derived from the word "cod" with an "k" added. It was chosen because it is short, easy to remember, and not suggestive of the photographic nature of the business.

nl · on Dec 10, 2023

But both are completely wrong! And technically the Google example is closer to correct than any others.

The Yi 34B eBay and Kodak examples are both (wrong but) very interesting because it does seem to get the idea of changing one letter.

Of GPT4 examples, the Qatar example (replacing "Q" with "Q" !?) is the only one that is internally consistent. The Pinterest and Tumblr examples are wrong in very odd ways in that the explanation doesn't match the spelling.

miles · on Dec 10, 2023

> Of GPT4 examples, the Qatar example (replacing "Q" with "Q" !?)...

That appears to be from vizzah's testing of Mistral-8x7B-Chat rather than GPT4.

nl · on Dec 11, 2023

Is it?

I thought the first was GPT4 and the second was mislabeled "Yi 34B Chat" when they meant "Mistral-8x7B-Chat"?

Otherwise why are they saying it is far from GPT4?

brucethemoose2 · on Dec 10, 2023

Here is my response from Yi, which is actually pretty good:

One well-known example is the brand Apple. The company's name comes from a play on the phrase "Apple of My Eye." Another example is Google, which is derived from the mathematical term "googolplex," which refers to the number 1 followed by 100 zeroes. It was originally meant to be a play on the word "googol" (1 followed by 100 zeroes), but a spelling mistake led to the creation of the new word "google."

Other examples include PepsiCo, which combines the words "pepsin" and "kola," and Amazon, which was inspired by the world's largest river, the Amazon River.

In the case of "musiq," it seems like a made-up name without any specific reference or known origin. However, some possible variations could be created using existing words, such as "dramatize" instead of "dramatize" or "fashin" instead of "fashion."

brucethemoose2 · on Dec 10, 2023

Actually I just realized I had the prompt template and sampling wrong. First shot once that's fixed:

" Some examples of famous brands that use this strategy include:

1. KFC (Kentucky Fried Chicken) - The brand takes the first letter of each word in its full name (Kentucky Fried Chicken) and combines them to create a memorable acronym.

2. Pepsi - The brand's name is a play on the word "pepsin," an enzyme used in digestion. The name was chosen because it has a positive connotation and is easy to remember.

3. Adidas - The brand's name is derived from the founder's name, Adolf Dassler, with the 'Das' replaced by an 'a'.

4. Asics - The brand's name is an acronym for "Animal, Speed, Intelligence, and Spirit."

5. H&M (Hennes & Mauritz) - The brand's name is a combination of two Swedish words: "Hennes" means "her" in English, and "Mauritz" refers to the men's clothing store "Mauritz Widforss."

6. Nike - The brand's name comes from the Greek goddess of victory, "Nike."

7. Amazon - The brand's name was inspired by the South American river Amazon, which is one of the world's largest rivers.

8. Apple - The brand's name was chosen due to co-founder Steve Jobs' belief that "Apple" was simple, memorable, and had a positive connotation.

9. Google - The brand's name is a play on the mathematical term "googol," which represents the number 1 followed by 100 zeros.

10. Twitter - The brand's name is a shortened version of "tweet," which refers to short messages sent through the platform.

These brands demonstrate how changing or combining letters from existing words can create unique and memorable names for businesses. "

- This is not vanilla yi, but a merge of several finetunes.

mirekrusin · on Dec 10, 2023

LLMs work on tokens where characters are hidden away. They'd have to be explicitly trained on spelling each token out into single letter tokens and as they are bad at information symmetry - from single letter tokens back onto tokens as well. I don't think anybody does this so they're left with what's in training data only. Otherwise they don't have chance to reconstruct this information as tokens could map to any equivalent visual language ie. looking like hieroglyphs where questions you're asking would be nonsensical.

jakderrida · on Dec 10, 2023

I thought so, too. But then I asked it to define fake words that were portmanteaus I made up. Believe me, my understanding of BERT and discriminant models aligned perfectly with what you're saying. But testing out the theory that it can break down and make meaning of fake words with accurate depictions of what words I'm combining proved me wrong. Generative models must work differently than you and I thought.

mirekrusin · on Dec 11, 2023

Now you're talking about words, above token abstraction - they're fine with that. They're bad at subtoken level.

behnamoh · on Dec 10, 2023

Are you comparing a 8x7b model with GPT-4? Come on...

brucethemoose2 · on Dec 11, 2023

A raw one with no wrappers, at that.

simonw · on Dec 10, 2023

I thought the new MoE Mistral model was called Mixtral.

irthomasthomas · on Dec 10, 2023

Correct. Makes for easy monitoring of hf. https://huggingface.co/models?search=mixtral The pace is exciting, eh?

How long till llm-mixtral?

Btw. The other day I uploaded ~4k chats from your llm dB to the code interpreter and had it label them. Worked pretty well. Only gave them one label each at first, then I started working on more complex labeling, but then my poor assistant ran out of steam and the interpreter session expired. So much pleasure and pain! Love your llm cli though. Thanks.

kristianp · on Dec 11, 2023

It's available at [1], I haven't tried their api yet.

[1] https://app.fireworks.ai/models

sabareesh · on Dec 11, 2023

Does anyone know how to load this into ollama

tempaway47641 · on Dec 10, 2023

Every day:

"Cheeseface just dropped the Blippy-7B model which is almost as good as the twinamp 34B model on the SwagCube benchmark when run locally as int8 and this shows that the gains made by the skibidi-70B model will probably filter down to the baseline Eras models in the next few weeks"

brucethemoose2 · on Dec 11, 2023

Thats giving the community way too much organization credit.

Everyone just seems to be running experiments independently and then randomly drop some results, with basically no documentation. Sometimes the motivation is clearly VC money or paper exposure, but sometimes there is no apparent motivation... Or even no model card. Then when something works, others copy the script.

Not that I dont enjoy it. I find the sea of finetune generations fascinating.

sebosp · on Dec 11, 2023

Now I feel I have to read again Haroun and the Sea of Stories from Rushdie thanks to your last comment <3 that's one of my favorite childhood reads