Hacker News new | past | comments | ask | show | jobs | submit login

I would expect them to use small sizes for almost all the testing.



Yes. There _is_ a need to train LLMs more than once, and training is prohibitively expensive, so you need workarounds such as training on a small subset of data, or a smaller version of the model. We're not yet at the point where a CS student on consumer hardware could afford to do this kind of research.


> We're not yet at the point where a CS student on consumer hardware could afford to do this kind of research.

Okay. But I was saying someone with millions of dollars to spend could do it. And then another poster was arguing that millions of dollars was not enough to be viable because you need lots of repeated runs.

Nobody was saying a student could train one of these models from scratch. The cool potential is for a student to run one, maybe fine tune it.


Here is the upthread comment I was responding to:

> Why would you want to retrain it from scratch every day?

I was explaining why someone might want to retrain it more than once (although not literally every day).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: