Hacker News new | past | comments | ask | show | jobs | submit | jafitc's comments login

I think you should consider trimming that file.

Exclude movies with very low number of rating or potentially very low scores too.

The long tail reduction would be significant


I initially loved looking for obscure stuff, e.g. setting region to soviet union. It surely is the case that 99% of the users want 10% of the data at most. I'll have to work ability to select the file and download & cache it only if the relevant query is asking for it.

"People are really bad at understanding just how big LLM's actually are. I think this is partly why they belittle them as 'just' next-word predictors"

https://nitter.net/jam3scampbell/status/1748200331215835561


Deepinfra Mixtral is $0.27 / M tokens as per their website


Hey, yep looks like they updated their pricing - we've now updated it on the site!


Important to note that this model excels in reasoning capabilities.

But it was on purpose not trained on the big “web crawled” datasets to not learn how to build bombs etc, or be naughty.

So it is the “smartest thinking” model in weight class or even comparable to higher param models, but it is not knowledgeable about the world and trivia as much.

This might change in the future but it is the current state.


But that still makes it great for RAG applications, where I want the answer to be based on my data, not on whatever it learned from the web.


Interesting. Anyone tried / benchmarked this for RAG?


yeah it's good. you'd want* to finetune this before using it (c.f. my reply to it's depressed and insults me for no reason whatsover? @ https://huggingface.co/microsoft/phi-2/discussions/61)

* by want, I mean need. People self-peasantized heavily on "censorsed models" and don't really understand how these work, and the SNR is out of wack because there's a 100000x more waifu creators and culture warriors than knowledgable people sharing on this subject


If you think that LLMs have basically two properties: habitability to use natural language and knowledge to answer questions, then Small language models should being seen just excellent at natural language, and that's great because for many tasks general knowledge is not needed, specially for RAG.


Which more or less mirrors human learning edges.

If someone read a set of dictionaries, but then talked to actual people... you'd get about the same.

E.g. complete obliviousness to colloquialisms, etc.


> This might change in the future but it is the current state I hope it doesn't change. The focus of a model shouldn't be to embed data. Retrieval is a better method to provide data to a model, and leads to less "sounds smart" but very wrong results.

Having less data embedded also means that the model is more generally usable outside the realm of chat assistants, where you only want the model to be aware about data you provide it. One example could be in games where you might have a medieval fantasy setting, it would be really weird if you could get a character to start talking to you about US politics. That probably still wouldn't work with Phi-2 without fine-tuning (as I imagine it does have some data of US politics embedded), but I hope it illustrates the point.


> But it was on purpose not trained on the big “web crawled” datasets to not learn how to build bombs etc, or be naughty.

It wasn't trained on web crawled data to make it less obvious that microsoft steals property and personal data to monetise it.


It was trained on "textbook quality" synthetic data + some high quality web data.

The question is - if we train a model on synthetic data generated by GPT-4 which has copyright issues, what is the status of this model? Will MS have to delete it as well? And all models trained with GPT-4 data?


> if we train a model on synthetic data generated by GPT-4 which has copyright issues

Is that the new directive from HQ? I see a lot of folks parroting this logic, ignoring that proceeds of crime are criminal themselves.


Do you think the ISIS is bound by the words “non-commercial” in a license file when they have the source anyway?

It was available even before this, all they changed is that law abiding citizens can put apps in the App Store and charge money for it.

(More importantly law abiding companies can build on and fine tune it in hopes of profit).


I haven't said anything regarding a license - where did you get that from?

ISIS, etc. can easily abuse an open-source model and abusing a closed source model running in the cloud, e.g. ChatGPT 4 is a lot harder.


This "vibe" check that it's even better than GPT-4 Turbo is not what its Elo rating shows on the Chatbot Arena based on not 1 but thousands of user votes. GPT-4 (Turbo) is in a league of its own still.


By its nature, that site isn't very representative of how the models perform in real-world use.


That depends on what real world use you're targeting, but unfortunately I'm not aware of anything better than that leaderboard in terms of sample size and model coverage.


The ELO leaderboard you mean?


The vibe check is for pro tho. I want to see how ultra is benchmarked.


This is based on users choosing the better from 2 models at a time, and calculating an ELO rating from who-beats-who.

BYOT - bring your own tests style.

Gives a better picture of real-world performance and more robust against contamination.

They collected over 6000 and 1500 votes for Mixtral-8x7B and Gemini Pro.

While ELO ratings are widely used to rank performance in Chess or among sports teams, here's a disclaimer by the makers of the leaderboard:

---

> Please note Arena is a "live eval" and pretty much a sampling process to estimate models capability.

> That's why we show the confidence intervals through bootstrapping. Statistically, these models (e.g., GPT-3.5, Mixtral, Gemini Pro) are very close and only looking at their ranking can be misleading.

https://twitter.com/lmsysorg/status/1735729398672716114

https://twitter.com/lmsysorg/status/1735751052287226059


from the announcement tweet: https://twitter.com/rasbt/status/1735293149965062476

---

So, we've been quietly building something new for running AI experiments and deploying models ...

Our Lightning AI Studios let you switch between different machines and GPUs flexibly in the same environment without any setup steps.

Everything can be accessed via your browser and supports

  - VSCode
  - Jupyter Notebook
  - a regular terminal
  - a control pane for multi-node jobs 
  - ... many, many collaborative and extra features
And there's no installation or setup step required at all.

It's basically what I've been using internally as a productivity tool for the last few months to run AI experiments.

(*there's also a demo video in the linked tweet)

---

A persistent GPU cloud environment.

Code online. Code from your local IDE. Prototype. Train. Serve. Multi-node. All from the same place.

No credit card. 6 Free GPU hours/month.


Important note: Bing balanced mode (default) uses GPT 3.5

Only Precise and Creative modes use GPT-4

https://twitter.com/emollick/status/1732495030143549541

Also see:

An Opinionated Guide to Which AI to Use: ChatGPT Anniversary Edition

https://www.oneusefulthing.org/p/an-opinionated-guide-to-whi...


All I can say is it’s really fast


Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: