Here's a glossary to understand this post: - mixtral-8x7 or 8x7: Open source mod...

vincentrolfs · on Jan 11, 2024

I asked ChatGPT to rewrite the original post using your glossary, which worked well:

I've set up my system to use several AI models: the open-source Mixtral-8x7, Dolphin (an uncensored version of Mixtral), GPT-3.5 Turbo (a cost-effective option from OpenAI), and the latest GPT-4 Turbo from OpenAI. I can easily compare their performances in Emacs. Lately, I've noticed that GPT-4 Turbo is starting to outperform Mixtral-8x7, which wasn't the case until recently. However, I'm still waiting for access to Mistral-Medium, a new, more exclusive AI model by Mistral AI.

I just found out that Perplexity, a new search engine competing with Google, is offering free access to Mistral Medium through their partnership. This makes me question Sam Altman, the CEO of OpenAI, and his claims about their technology. Mistral Medium seems superior to GPT-4 Turbo, and if it were expensive to run, Perplexity wouldn't be giving it away.

I'm guessing that Mistral AI could become the next Renaissance Technologies (a hedge fund known for its innovative strategies) of the AI world. Techniques like Direct Preference Optimization, which improves smaller models, along with other advancements like the Alibi Python library for understanding AI models, sliding windows for longer text sequences, and combining multiple models, are now well understood. The real opportunity lies in quickly adapting these new technologies before they become mainstream and affordable.

Big companies are cautious about adopting these new structures, remembering their dependence on Microsoft in the past. They're willing to experiment with AI until it becomes both affordable and easy to use in-house.

It's sad to see the old technology go, but exciting to see the new advancements take its place.

benreesman · on Jan 11, 2024

The GP did a great job summarizing the original post and defining a lot of cryptic jargon that I didn't anticipate would generate so much conversation, and I'd wager did it without a blind LLM shot (though these days even that is possible). I endorse that summary without reservation.

And the above is substantially what I said, and undoubtedly would find a better reception with a larger audience.

I'm troubled though, because I already sanitize what I write and say by passing it through a GPT-style "alignment" filter in almost every interaction precisely because I know my authentic self is brash/abrasive/neuro-atypical/etc. and it's more advantageous to talk like ChatGPT than to talk like Ben. Hacker News is one of a few places real or digital where I just talk like Ben.

Maybe I'm an outlier in how different I am and it'll just be me that is sad to start talking like GPT, and maybe the net change in society will just be a little drift towards brighter and more diplomatic.

But either way it's kind of a drag: either passing me and people like me through a filter is net positive, which would suck but I guess I'd get on board, or it actually edits out contrarian originality in toto, in which case the world goes all Huxley really fast.

Door #3 where we net people out on accomplishment and optics with a strong tilt towards accomplishment doesn't seem to be on the menu.

cgeier · on Jan 11, 2024

I would have said there is no problem with your style (nothing brash/abrasive), but you used a lot of jargon, that people who are not very deep into LLMs (large language models) would not understand. Interests of hackernews visitors are very diverse, not everyone follows LLMs that closely.

smeej · on Jan 11, 2024

This was my take exactly. I read the original and thought, "Wow, this sounds like really interesting stuff this poster us excited about. I wish I knew what the terms meant, though. I'll have to come back to this when I have more time and look up the terms."

I was pleasantly surprised to find a glossary immediately following, which tells me it wasn't the tone of the post, but the shorthand terminology that was unfamiliar to me that was my issue.

I think writing in "Ben's voice" is great. There are just going to be times when your audience needs a bit more context around your terminology, that's all.

mwigdahl · on Jan 11, 2024

I think the only thing you really need to do is unpack your jargon so people who aren't exactly you can understand what you're saying. Even on this site, there are folks with all sorts of different experiences and cultural context, so shortcuts in phrasing don't always come across clearly.

For example, "in which case the world goes all Huxley really fast." "Huxley" apparently means something to you. Would it mean anything at all to someone who hasn't read any Aldous Huxley? As someone who _has_, I still had to think about it -- a lot. I assumed you're referring to a work of his literature rather than something he actually believed, as Huxley's beliefs about the world certainly had a place for the contrarian and the original.

Further, I assume you are referring to his most well-known work, _Brave New World_, rather than (for example) _Island_, so you're not saying that people would be eating a lot of psychedelic mushrooms and living together in tolerant peace and love.

I don't at all think you need to sound like GPT to be a successful communicator, but you will be more successful the more you consider your audience and avoid constructions that they're unlikely to be able to understand without research.

NeutralCrane · on Jan 11, 2024

People aren’t passing you through a filter because you are brash and undiplomatic and “unaligned”, it’s because your communication style is borderline incomprehensible.

engcoach · on Jan 11, 2024

FWIW I found it comprehensible just jargon-laden

a_petrov · on Jan 11, 2024

I used to struggle a lot in communication for talking to people in the authentic self way you just described. Being too direct and telling my point of view in such a way has caused tension with family, colleagues and the girlfriend.

The moment I change the way I talk and say instead of "That's bullshit, let's move away from it" to "That could be a challenging and rewarding experience", and I can already see the advantage.

I rather like to talk the way I want, but I see it as challenging and not that rewarding as people seem to get more sensitive. That made me wonder if the way GPT-style chatbots communicate with humans would make humans expect the same way of communication from other humans.

zemvpferreira · on Jan 11, 2024

Porque no los dos? While I truly appreciate your OP and could grok it even though I don't know the tech, the summary and rewrites saved me a ton of googling. I hope one day we have a 'see vernacular/original' button for all thought and communication so people can choose what level to engage in without the author having to change their communication style. Translation for personal dialects, so to say.

boringg · on Jan 11, 2024

Real Ben >> GPT Ben. However if you are going to the wider world you problem need to self varnish a lot (i know i would have to). You are fine in here!

What you are alluding to is quite similar to the that “instagram face” that everyone pursues and self filters for except its more about your communication and thoughts. Also the argument that you need to reach a wider audience i dint think isn't necessary unless you want the wider audience to comment and engage.

The internet is the great homogenizer soon(ish) we will be uniform.

drittich · on Jan 11, 2024

I think this is just in the short-term. In the long term GPTs will retain our personality while making the message more understandable - which I think is the most important thing. Although McLuhan would disagree. Benefits, though, might include AI making cross-cultural translation so you can converse with someone with a different language and very different experiences and still understand each other. I think that's good? Maybe?

yashap · on Jan 11, 2024

Personally I had no problem with your post being too abrasive or anything like that, it was just hard to understand because it was full of jargon.

euiq · on Jan 12, 2024

Your posts are my favorite thing about Hacker News, both because of the things you're saying and the way you're saying them; please don't let anyone tell you otherwise.

ComplexSystems · on Jan 11, 2024

Thank you! Amazing how difficult it is to keep up with all of the new jargon given how fast it's evolved. I had no idea that mistral-medium was so great.

neals · on Jan 11, 2024

Crazy, your post feels like downloading martial arts in the Matrix. I read the parent, didn't get a thing and though the guy was on substances. Read yours. Read the parent again. I speak AI now! I'm going to use this new power to raise billions!

benreesman · on Jan 11, 2024

I'm clearly spending far too much time tuning/training/using these things if a glossary to make my post comprehensible to HN is longer than my remark: thank you for correcting my error in dragging this sub-sub-sub-field into a thread of general interest.

azeirah · on Jan 11, 2024

That's an impressive list of jargon whaha

Love how deep the rabbithole has gone in just a year. I am unfortunately in the camp of understanding the post without needing a glossary. I should go outside more :|

Smerity · on Jan 11, 2024

I think you've done a great explanation expansion except I believe it's ALiBi ("Attention with Linear Biases Enables Input Length Extrapolation"), a method of positional encoding (i.e. telling the Transformer model how much to weight a distant token when computing the current output token). This has been used on various other LLMs[2].

[1]: https://arxiv.org/abs/2108.12409

[2]: n.b. Ofir Press is co-creator of ALiBi https://twitter.com/OfirPress/status/1654538361447522305

benreesman · on Jan 11, 2024

This is indeed what I was referring to and along with RoPE and related techniques is a sort of "meta-attention" in which a cost-effective scalar pointwise calculation can hint the heavyweight attention mechanism with super-linear returns in practical use cases.

In more intuitive terms, your bog-standard transformer overdoes it in terms of considering all context equally in the final prediction, and we historically used rather blunt-force instruments like causally masking everything to zero.

These techniques are still heuristic and I imagine every serious shop has tweaks and tricks that go with their particular training setup, but the Rope shit in general is kind of a happy medium and exploits locality at a much cheaper place in the overall computation.

lhl · on Jan 11, 2024

My understanding is that Mistral uses a regular 4K RoPE that is "extends" the window size with SWA. This is based on looking at the results of Nous Research's Yarn-Mistral extension: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k and Self-Extend, both of which only apply to RoPE models.

There are quite a few recent attention extension techniques recently published:

* Activation Beacons - up to 100X context length extension in as little as 72 A800 hours https://huggingface.co/papers/2401.03462

* Self-Extend - a no-training RoPE modification that can give "free" context extension with 100% passkey retrieval (works w/ SWA as well) https://huggingface.co/papers/2401.01325

* DistAttention/DistKV-LLM - KV cache segmentation for 2-19X context length at runtime https://huggingface.co/papers/2401.02669

* YaRN - aforementioned efficient RoPE extension https://huggingface.co/papers/2309.00071

You could imagine combining a few of these together to basically "solve" the context issue while largely training for shorter context length.

There are of course some exciting new alternative architectures, notably Mamba https://huggingface.co/papers/2312.00752 and Megabyte https://huggingface.co/papers/2305.07185 that can efficiently process up to 1M tokens...

Kerbonut · on Jan 11, 2024

imo mistral-medium is worse than mixtral. Do you have API access?

EmilStenstrom · on Jan 11, 2024

Thank you!

spuz · on Jan 11, 2024

As someone who follows AI pretty closely, this was unbelievably helpful in understanding the parent post. It's crazy how much there is to keep on top of if you don't want to fall behind everything that is going on in AI at the moment.

pandemic_region · on Jan 11, 2024

Did you just paste that into an LLM and asked it to create a glossary? :-P

(but seriously: Thanks !)

coldtea · on Jan 11, 2024

Emil didn't, but I did (and yeah, it's useless):

Mixtral-8x7: This appears to be a technical term, possibly referring to a software, framework, or technology. Its exact nature is unclear without additional context.

Dolphin locally: "Dolphin" could refer to a software tool or framework. The term "locally" implies it is being run on a local machine or server rather than a remote or cloud-based environment.

3.5-turbo: This could be a version name or a type of technology. "Turbo" often implies enhanced or accelerated performance.

4-series preview: Likely a version or iteration of a software or technology that is still in a preview or beta stage, indicating it's not the final release.

Emacs: A popular text editor used often by programmers and developers. Known for its extensibility and customization.

Mistral Medium: This might be a product or service, possibly in the realm of technology or AI. The specific nature is not clear from the text alone.

Perplexity: Likely a company or service provider, possibly in the field of AI or technology. They seem to have a partnership offering involving Mistral Medium.

RenTech of AI: RenTech, or Renaissance Technologies, is a well-known quantitative hedge fund. The term here is used metaphorically to suggest a pioneering or leading position in the AI field.

DPO, Alibi, and sliding window: These are likely technical concepts or tools in the field being discussed. Without additional context, their exact meanings are unclear.

Modern mixtures: This could refer to modern algorithms, techniques, or technologies in the field of AI or data science.

TheBloke: This could be a reference to an individual, a role within a community, or a specific entity known for certain expertise or actions.

4070 Super: This seems like a model name, possibly of a computer hardware component like a GPU (Graphics Processing Unit).

MSFT: An abbreviation for Microsoft Corporation.

On-premise: Refers to software or services that are operated from the physical premises of the organization, as opposed to being hosted on the cloud.

aftoprokrustes · on Jan 11, 2024

This is actually hilarious. It looks like a student who did not learn for the exam but still tries their best to scratch a point or two by filling the page with as many reasonnable sounding statements (a.k.a. "bullshit") as they can. Not that I expect more of a language model, no matter how "large".

siva7 · on Jan 11, 2024

This makes it actually more human (as in this is how a real student would have reacted) and therefore AGI-like than i would have anticipated.

PoignardAzur · on Jan 11, 2024

> Not that I expect more of a language model, no matter how "large".

That's a weirdly dismissive statement. The fundamental problem is that a lot of these terms are from after the AI's cutoff point. It's perfectly able to handle terms like "Emacs", "RenTech" or "MSFT", and it can guess that "4070 Series" probably refers to a GPU.

ChatGPT in a few years will probably be perfectly able to produce the correct answers.

(Actually, ChatGPT consistently claims its current cutoff is April 2023, which should let it give a better answer, so I'm taking a few points off my explanation. But it still feels like the most probable one.)

fragmede · on Jan 11, 2024

GPT4 is able to look terms up on the Internet if you ask, and will give you a list of specs on it, with a cite so you know it's not hallucinating them.

"4070 Super": https://chat.openai.com/share/0aac7d90-de65-41d0-9567-8e56a0...

"Mixtral-8x7": https://chat.openai.com/share/8091ac61-d602-414c-bdce-41b49e...

rrr_oh_man · on Jan 11, 2024

I love you, Emil

hmottestad · on Jan 11, 2024

Thanks for this. I was initially wondering what this new GPT 4.5 model was and if I had somehow missed out on something big.