More

jafitc · 2024-11-05T11:16:00 1730805360

I think you should consider trimming that file.

Exclude movies with very low number of rating or potentially very low scores too.

The long tail reduction would be significant

afiodorov · 2024-11-05T12:39:21 1730810361

I initially loved looking for obscure stuff, e.g. setting region to soviet union. It surely is the case that 99% of the users want 10% of the data at most. I'll have to work ability to select the file and download & cache it only if the relevant query is asking for it.

jafitc · 2024-01-19T19:55:09 1705694109

"People are really bad at understanding just how big LLM's actually are. I think this is partly why they belittle them as 'just' next-word predictors"

https://nitter.net/jam3scampbell/status/1748200331215835561

jafitc · 2024-01-16T23:16:38 1705446998

Deepinfra Mixtral is $0.27 / M tokens as per their website

_micah_h · 2024-01-17T02:28:08 1705458488

Hey, yep looks like they updated their pricing - we've now updated it on the site!

jafitc · 2024-01-06T11:58:08 1704542288

Important to note that this model excels in reasoning capabilities.

But it was on purpose not trained on the big “web crawled” datasets to not learn how to build bombs etc, or be naughty.

So it is the “smartest thinking” model in weight class or even comparable to higher param models, but it is not knowledgeable about the world and trivia as much.

This might change in the future but it is the current state.

rolisz · 2024-01-06T15:41:40 1704555700

But that still makes it great for RAG applications, where I want the answer to be based on my data, not on whatever it learned from the web.

monkeydust · 2024-01-06T15:56:37 1704556597

Interesting. Anyone tried / benchmarked this for RAG?

refulgentis · 2024-01-06T18:42:40 1704566560

yeah it's good. you'd want* to finetune this before using it (c.f. my reply to it's depressed and insults me for no reason whatsover? @ https://huggingface.co/microsoft/phi-2/discussions/61)

* by want, I mean need. People self-peasantized heavily on "censorsed models" and don't really understand how these work, and the SNR is out of wack because there's a 100000x more waifu creators and culture warriors than knowledgable people sharing on this subject

dlojudice · 2024-01-06T16:14:21 1704557661

If you think that LLMs have basically two properties: habitability to use natural language and knowledge to answer questions, then Small language models should being seen just excellent at natural language, and that's great because for many tasks general knowledge is not needed, specially for RAG.

ethbr1 · 2024-01-06T18:14:49 1704564889

Which more or less mirrors human learning edges.

If someone read a set of dictionaries, but then talked to actual people... you'd get about the same.

E.g. complete obliviousness to colloquialisms, etc.

notnullorvoid · 2024-01-06T17:43:08 1704562988

> This might change in the future but it is the current state I hope it doesn't change. The focus of a model shouldn't be to embed data. Retrieval is a better method to provide data to a model, and leads to less "sounds smart" but very wrong results.

Having less data embedded also means that the model is more generally usable outside the realm of chat assistants, where you only want the model to be aware about data you provide it. One example could be in games where you might have a medieval fantasy setting, it would be really weird if you could get a character to start talking to you about US politics. That probably still wouldn't work with Phi-2 without fine-tuning (as I imagine it does have some data of US politics embedded), but I hope it illustrates the point.

gumballindie · 2024-01-06T17:53:23 1704563603

> But it was on purpose not trained on the big “web crawled” datasets to not learn how to build bombs etc, or be naughty.

It wasn't trained on web crawled data to make it less obvious that microsoft steals property and personal data to monetise it.

visarga · 2024-01-06T18:20:36 1704565236

It was trained on "textbook quality" synthetic data + some high quality web data.

The question is - if we train a model on synthetic data generated by GPT-4 which has copyright issues, what is the status of this model? Will MS have to delete it as well? And all models trained with GPT-4 data?

gumballindie · 2024-01-06T19:46:18 1704570378

> if we train a model on synthetic data generated by GPT-4 which has copyright issues

Is that the new directive from HQ? I see a lot of folks parroting this logic, ignoring that proceeds of crime are criminal themselves.

jafitc · 2024-01-06T11:51:38 1704541898

Do you think the ISIS is bound by the words “non-commercial” in a license file when they have the source anyway?

It was available even before this, all they changed is that law abiding citizens can put apps in the App Store and charge money for it.

(More importantly law abiding companies can build on and fine tune it in hopes of profit).

borissk · 2024-01-06T15:11:46 1704553906

I haven't said anything regarding a license - where did you get that from?

ISIS, etc. can easily abuse an open-source model and abusing a closed source model running in the cloud, e.g. ChatGPT 4 is a lot harder.

jafitc · 2023-12-16T01:43:37 1702691017

This "vibe" check that it's even better than GPT-4 Turbo is not what its Elo rating shows on the Chatbot Arena based on not 1 but thousands of user votes. GPT-4 (Turbo) is in a league of its own still.

npinsker · 2023-12-16T05:00:33 1702702833

By its nature, that site isn't very representative of how the models perform in real-world use.

Reubend · 2023-12-18T06:53:22 1702882402

That depends on what real world use you're targeting, but unfortunately I'm not aware of anything better than that leaderboard in terms of sample size and model coverage.

ssabev · 2023-12-16T06:52:15 1702709535

The ELO leaderboard you mean?

Racing0461 · 2023-12-16T02:00:15 1702692015

The vibe check is for pro tho. I want to see how ultra is benchmarked.

jafitc · 2023-12-15T23:24:59 1702682699

This is based on users choosing the better from 2 models at a time, and calculating an ELO rating from who-beats-who.

BYOT - bring your own tests style.

Gives a better picture of real-world performance and more robust against contamination.

They collected over 6000 and 1500 votes for Mixtral-8x7B and Gemini Pro.

While ELO ratings are widely used to rank performance in Chess or among sports teams, here's a disclaimer by the makers of the leaderboard:

---

> Please note Arena is a "live eval" and pretty much a sampling process to estimate models capability.

> That's why we show the confidence intervals through bootstrapping. Statistically, these models (e.g., GPT-3.5, Mixtral, Gemini Pro) are very close and only looking at their ranking can be misleading.

https://twitter.com/lmsysorg/status/1735729398672716114

https://twitter.com/lmsysorg/status/1735751052287226059

jafitc · 2023-12-14T15:11:42 1702566702

from the announcement tweet: https://twitter.com/rasbt/status/1735293149965062476

---

So, we've been quietly building something new for running AI experiments and deploying models ...

Our Lightning AI Studios let you switch between different machines and GPUs flexibly in the same environment without any setup steps.

Everything can be accessed via your browser and supports

  - VSCode
  - Jupyter Notebook
  - a regular terminal
  - a control pane for multi-node jobs 
  - ... many, many collaborative and extra features

And there's no installation or setup step required at all.

It's basically what I've been using internally as a productivity tool for the last few months to run AI experiments.

(*there's also a demo video in the linked tweet)

---

A persistent GPU cloud environment.

Code online. Code from your local IDE. Prototype. Train. Serve. Multi-node. All from the same place.

No credit card. 6 Free GPU hours/month.

jafitc · 2023-12-10T11:34:52 1702208092

Important note: Bing balanced mode (default) uses GPT 3.5

Only Precise and Creative modes use GPT-4

https://twitter.com/emollick/status/1732495030143549541

Also see:

An Opinionated Guide to Which AI to Use: ChatGPT Anniversary Edition

https://www.oneusefulthing.org/p/an-opinionated-guide-to-whi...

jafitc · 2023-12-07T19:51:59 1701978719

All I can say is it’s really fast