Keep in mind that the rankings are just that: an order of "best" to "worst". There's nothing on there that's an indication of how much better one model is from another.
Looks like 62 points better, which under the Elo model implies about 59/41 matchup (aka GPT-4-Turbo is about 59% chance to beat Claude on a random prompt). You can look up an Elo probability calculator to plug in an Elo difference and see this number. For example plug in two Elo ratings here: https://sandhoefner.github.io/chess.html - really only the difference between them matters.
Btw the similarity of the numbers 59 and 62 here is coincidence, other differences won't be nearly the same as the probability.
This is a finetune of the torrent-dropped moe model from a few days ago. Not a "release" per se, but rather a display of community excitement over this.
Honestly none of them are particularly incredible compared to neither. At this point everything is a prototype, there’s no sense in building knowledge specific to them unless you’re also building LLM products.
It’s just distributed ADHD at this point. LLMs are new and cool and each new release will be significantly better than the last, but as with any emerging tech, we’re on an exponential curve so there’s no sense in falling in love with specifics or products until things stabilize.
"Put all your eggs in one basket" isn't the traditional advice.
Any one of these companies can at any time change their API, pricing, access rules, or even swap the model out for a dumber, cheaper one at the same price. You'll have no recourse if you don't have several backends available, or control your own.
At a minimum, you should have available several hot-swappable backends/APIs if you want to remain viable in an indeterminate future.
I want someone to offer a large sum of money to anyone who can leak the full GPT-4 model to prevent ClosedAIs lunacy. We already got a model leak from NAI and it forever propelled AI waifus. Why can’t some Good Samaritan take one for the team and give the world access to the closest thing we have to AGI? Whoever does it will be remembered even more positively than Aaron Swartz
Why should I respect copyright on a model when the model training didn’t respect copyright? To be clear, I’m a copyright abolitionist so I’m down with not accepting copyright. I want someone to force OpenAI to be actually open.
Why not. For simple things like converting, translating its quite handy and the costs are with you. I use open models for removing backgrounds from images. Also use models to translate texts. Openai would be too expensive for my usecases.
Man, I liked OpenRouter but most models run so much slower on OR that I had to switch to other solutions. Is there a reason for horribly slow Meta Llama 2 models?
Somewhere between shiny Google releases and Mistral's magnet link tweet, there's gotta be a sweet spot where you release the model but also have enough decency to tell people how to use it optimally. Mistral, if you're reading this, I'm talking about you.
llama.cpp support needed ASAP, not just quantized, but I hope it will be albe to run in a way that keeps the same two experts for minimum N tokens generated and re-valuate the expert to select only then, so that it will be possible to do the inference in low ram doing something similar to "paging".
I'm looking forward to a 4-bit quantized version that I can run on my Intel Thinkpad with 32GB RAM, should be almost usable if its as fast as a 14B size model.
Whether it's any good is another matter, I guess the leaderboard on HuggingFace will be updated at some point.
The linked comment is authored by the CEO of Shopify; the top-level comment in this thread is the .. what would you say, author/originator/founder of Redis. (And maybe/presumably more widely known in C-land for other projects or contributions, idk.)
> so that it will be possible to do the inference in low ram doing something similar to "paging".
I was also hoping for that, but my initial research suggests that you need to fit everything into VRAM (or RAM when doing llama.cpp CPU inference).
The upside is that it should be as fast as running a single 7B model, which on llama.cpp should be in the ballpark of 10 tok/sec when 4-bit quantized.
For local usage, I'd say you will have an "OK" experience in a recent laptop with 64GB RAM, which is more realistic than having the necessary VRAM in a Nvidia GPU.
Correction, it should have the performance of a single 12B parameter model:
"Concretely, Mixtral has 45B total parameters but only uses 12B parameters per token. It, therefore, processes input and generates output at the same speed and for the same cost as a 12B model."
PSA: This is broken for Apple silicon. It relies on the CUDA-only `flash_attn` package. Alas, I wanted to try Mistral's MoE on my MacBook. :( Understandable though! It's a new model with an incredibly bizarre release. My phalanges are crossed.
There’s probably a better place to ask this highly specific technical question, but I’m avoiding Reddit these days so just throwing it out I guess. I’ve been trying to run these in a container but it’s verrrry slow, I believe, because of the lack of gpu help. All the instructions I find are for nvidia gpus and my server is a qnap tvs-473e with an embedded amd cpu/gpu (I know, I know). The only good news is that I’ve upgraded the ram to 32gb, and I have a 1TB ssd. Any idea of how I can get my own self-hosted LLM/chat service on this funky hardware? The nvidia/docker option requires installing the nvidia runtime alongside docker, but I can’t find an amd equivalent.
Thanks. Sorry for the wall of text nobody cares about.
Specs say this runs an AMD RX-421BD. This is a 2015 AMD CPU with 2 bulldozer cores and a tiny IGP.
...To be blunt, you would be much better off running LLMs on your phone. Even an older phone. Or literally whatever device you are reading HN on. But if you insist, the runtime you want in MLC-LLM's Vulkan runtime.
This. Sibling llama.cpp comment is standard "I know llama.cpp, I assume that's 80% of the universe instead of .8%, and I assume that's all anyone needs. So I know just enough to be dangerous with ppl looking for advice".
You'll see it over and over again when you're looking for help, be careful, it's 100% a blind alley in your case. It's very likely you'll be disappointed by MLC as well, simultaneously it's your only real option. You definitely won't hit 1 tkn/sec, and honestly, id bet 0.1 tkn / sec
The OpenCL backend is not that great. Its kinda slow on GPUs, completely unusable on IGPs, and lags behind the CUDA, Metal and rOCM backends feature wise.
To be blunt, there isn't much interest in support outside of apple/nvidia. There is a WIP Vulkan backend, but (last I checked) progress is slow and its not optimzed for IGPs either.
MLC-LLM is much more promising once its features get fleshed out, as it "inherits" support for many devices from its Apache TVM backend.
Yeah I feel this way about llama.cpp too. Maybe not that exact wording, but its not the end all be all llama runtime like everyone seems to think it is.
Thanks, I’ll look into it! Especially if the llama.cpp route is a dud, like the other response says it will be. My little qnap clunker handles all the self hosting stuff I throw at it, but I won’t be surprised if it simply has met its match
> my server is a qnap tvs-473e with an embedded amd cpu/gpu
That's your problem. I googled and it looks like one of these all-in-one appliances like a drobo or whatever's popular these days. That's not a server. (At least, I wouldn't call it a server. It's an all-in-one appliance, or toy, depending on perspective) And yegods, that price...
Spend $500, get an actual computer, not some priced up appliance, and you'll have a much better time. Regardless of if you spend it on more CPU or more GPU. You can get a used computer off ebay for $100 and shove a $400 graphics card in it. Or maybe get a ryzen 7 7700x, I'm looking at a mobo+cpu combo with that for $500 right now.
Finally, to make sure this response does contain a answer to what you asked: ;-)
if you can run this stuff in a container on your appliance already, but it's very slow, congrats! I'd call that a win. I looked up the chip, the RX-421BD, it's of similar power as an Athelon circa 2017. I think my router might have more compute power. You _do_ have those 512 shader cores, given effort, you could try and get them to do something useful. But I wouldn't assume it's possible (well, maybe you don't mind writing your own shaders ;-)). Just because the chip has "some gpu" doesn't mean it has "the right kind of gpu you'd need to hijack for lots of matrix multiplies, without writing the assembly yourself".
Sorry this isn't more helpful, but it's the truth.
Something to add: if your NAS-Server is advertised with it's dual 4K capable HDMI ports, you know you're looking at a consumer toy targeted at people who hoard (pirated) media and not for serious stuff.
I hadn't even noticed that, I just saw "I could build that for 1/4th the cost"+"wtf, only 4 drives?"
Stuff like this prompts a dual response in me. :)
It always gives me a strong urge to educate "it's not that hard, and it's fun" to build it yourself.
AND it always makes me kick myself for not commercializing the expandable media servers I started building in the early 2000s, for me, for the dorm, for my friends, i.e. exactly the people you identify. :)
Meh I bought this 5 years ago because there was a sale on 10tb hard drives and I thought “Why shouldn’t I become a data hoarder?” And now it runs homeassistant and frigate and MeTube and jellyfin and if it doesn’t work for ollama then I’ll probably just deal with it, lol.
Fair enough, fair enough. Also I'm a data hoarder too, have been since pre-reddit. internet high five. :) I've gotten a lot of mileage (at least 15 years) off of centralizing all my storage, and centralizing all my compute, but _not_ having them be the same box. YMMV, of course.
Unrelated, cool, I hadn't heard of any of those 4 programs, I'm googling now and some look useful. Thanks! Possibly saving me some time in my next project...
Thanks! I was just following the thread about their recent addition of the OpenCl support and was on the verge of trying it out last weekend. I’ll definitely continue once I’m home again!
Save yourself some time and buy a 4090 if you really want to be high tier (consumer range) you will have a much faster experience. Not only with text. Also stable diffusion etc
This model is better by many other contenders, but still far from GPT4.
"what famous brands are there which change one letter from a common word to make a non-existent, but a catchy name, such as "musiq" instead of "music".. etc?"
There are several brands that have played with words by changing a letter or adding a letter to create a new and memorable name. Here are a few examples:
Qatar Airways - This airline's name was inspired by the country of Qatar, but with the letter "Q" replaced by "Q" to create a unique and catchy name.
Lush Cosmetics - This cosmetics company's name is a play on the word "lush," which means "profuse" or "luxuriant."
Pinterest - This social media site's name is a combination of the words "pin" and "interest," but with the letter "e" replaced by "i" to create a more unique and memorable name.
Tumblr - This social media platform's name is a play on the words "tumble" and "blogger," but with the letter "l" replaced by "r" to create a more unique and catchy name.
---------
that is compared to hilarious Yi 34B Chat below: =)
There are several famous brands that have used this strategy of changing one letter from a common word to create a unique and catchy name. Here are a few examples:
eBay - Instead of "cBay," which would be a reference to "crab," eBay was named after the first three letters of the word "baY" and the "e" from the word "electronic," as in "electronic marketplace."
Google - Originally named "BackRub," the company changed its name to Google, a play on the word "googol," which refers to the number one followed by one hundred zeros, expressing the vastness of the internet.
Kodak - The name Kodak is derived from the word "cod" with an "k" added. It was chosen because it is short, easy to remember, and not suggestive of the photographic nature of the business.
But both are completely wrong! And technically the Google example is closer to correct than any others.
The Yi 34B eBay and Kodak examples are both (wrong but) very interesting because it does seem to get the idea of changing one letter.
Of GPT4 examples, the Qatar example (replacing "Q" with "Q" !?) is the only one that is internally consistent. The Pinterest and Tumblr examples are wrong in very odd ways in that the explanation doesn't match the spelling.
Here is my response from Yi, which is actually pretty good:
One well-known example is the brand Apple. The company's name comes from a play on the phrase "Apple of My Eye." Another example is Google, which is derived from the mathematical term "googolplex," which refers to the number 1 followed by 100 zeroes. It was originally meant to be a play on the word "googol" (1 followed by 100 zeroes), but a spelling mistake led to the creation of the new word "google."
Other examples include PepsiCo, which combines the words "pepsin" and "kola," and Amazon, which was inspired by the world's largest river, the Amazon River.
In the case of "musiq," it seems like a made-up name without any specific reference or known origin. However, some possible variations could be created using existing words, such as "dramatize" instead of "dramatize" or "fashin" instead of "fashion."
Actually I just realized I had the prompt template and sampling wrong. First shot once that's fixed:
"
Some examples of famous brands that use this strategy include:
1. KFC (Kentucky Fried Chicken) - The brand takes the first letter of each word in its full name (Kentucky Fried Chicken) and combines them to create a memorable acronym.
2. Pepsi - The brand's name is a play on the word "pepsin," an enzyme used in digestion. The name was chosen because it has a positive connotation and is easy to remember.
3. Adidas - The brand's name is derived from the founder's name, Adolf Dassler, with the 'Das' replaced by an 'a'.
4. Asics - The brand's name is an acronym for "Animal, Speed, Intelligence, and Spirit."
5. H&M (Hennes & Mauritz) - The brand's name is a combination of two Swedish words: "Hennes" means "her" in English, and "Mauritz" refers to the men's clothing store "Mauritz Widforss."
6. Nike - The brand's name comes from the Greek goddess of victory, "Nike."
7. Amazon - The brand's name was inspired by the South American river Amazon, which is one of the world's largest rivers.
8. Apple - The brand's name was chosen due to co-founder Steve Jobs' belief that "Apple" was simple, memorable, and had a positive connotation.
9. Google - The brand's name is a play on the mathematical term "googol," which represents the number 1 followed by 100 zeros.
10. Twitter - The brand's name is a shortened version of "tweet," which refers to short messages sent through the platform.
These brands demonstrate how changing or combining letters from existing words can create unique and memorable names for businesses.
"
- This is not vanilla yi, but a merge of several finetunes.
LLMs work on tokens where characters are hidden away. They'd have to be explicitly trained on spelling each token out into single letter tokens and as they are bad at information symmetry - from single letter tokens back onto tokens as well. I don't think anybody does this so they're left with what's in training data only. Otherwise they don't have chance to reconstruct this information as tokens could map to any equivalent visual language ie. looking like hieroglyphs where questions you're asking would be nonsensical.
I thought so, too. But then I asked it to define fake words that were portmanteaus I made up. Believe me, my understanding of BERT and discriminant models aligned perfectly with what you're saying. But testing out the theory that it can break down and make meaning of fake words with accurate depictions of what words I'm combining proved me wrong. Generative models must work differently than you and I thought.
Btw. The other day I uploaded ~4k chats from your llm dB to the code interpreter and had it label them. Worked pretty well. Only gave them one label each at first, then I started working on more complex labeling, but then my poor assistant ran out of steam and the interpreter session expired. So much pleasure and pain! Love your llm cli though. Thanks.
"Cheeseface just dropped the Blippy-7B model which is almost as good as the twinamp 34B model on the SwagCube benchmark when run locally as int8 and this shows that the gains made by the skibidi-70B model will probably filter down to the baseline Eras models in the next few weeks"
Thats giving the community way too much organization credit.
Everyone just seems to be running experiments independently and then randomly drop some results, with basically no documentation. Sometimes the motivation is clearly VC money or paper exposure, but sometimes there is no apparent motivation... Or even no model card. Then when something works, others copy the script.
Not that I dont enjoy it. I find the sea of finetune generations fascinating.
Is there a centralized list somewhere that tests "use this for x purpose, use that for y?"