Hacker News new | past | comments | ask | show | jobs | submit login

You can do it once, but probably not every day.



Why would you want to retrain it from scratch every day? Stable Diffusion doesn't do that either.


Well maybe not every day, but having a short feedback loop and the ability to run your code multiple times with different variations is generally considered to be a prerequisite for software development. If you actually want to keep developing the model, you need the funding to be able to train it more than once.


To summarize this discussion, we went from "this might mean we don't need a fleet of $10k+ GPUs to even run a LLM" to "yeah but an individual couldn't train one every day though". These goalposts are breaking the sound barrier.


>but having a short feedback loop and the ability to run your code multiple times with different variations is generally considered to be a prerequisite for software development

This is not "software development" in general, this is LLM training.

It's not like you're building some regular app, api, or backend.


If you are claiming that training a LLM literally only one time is enough and there is no need to train it more than once, you are wrong. The researchers who created OPT didn't go into a basement for 12 months, then come out, train their model once, hit publish, and go to coffee. That is a fantasy. Likewise, if a CS student wants to dabble in this research, they need the ability to train more than once.

I'm not gonna engage in a rhetorical argument about whether this should be called "software development" or "LLM development" or something else. That's unrelated to the question of how much training is required.


>If you are claiming that training a LLM literally only one time is enough and there is no need to train it more than once, you are wrong.

No, I'm rather claiming that what you claimed is wrong in the context of LLM training: "Well maybe not every day, but having a short feedback loop and the ability to run your code multiple times with different variations is generally considered to be a prerequisite for software development".

LLM training is not the same as writing a program and "running your code with different variations". For LLM you don't need to quickly rerun everything with some new corpus - it would be nice, but it's neither a prerequisite, not even crucial for any current use.

Hell, it's not even a "prerequisite" in programming, just good to have. Tons of great programs have been written with very slow build times, without quick edit/compile/build/run cycles.


I wasn't talking about running the same code with a new corpus. For that kind of use case one can simply fine tune the pretrained model. The example I gave was "if a CS student wants to dabble in this research".

You said "LLM training is not the same as writing a program and running your code with different variations". How do you think these LLMs were made, seriously? Do you think Facebook researchers sat down for 12 months and wrote code non stop without compiling it once, until the program was finished and was used to train the LLM literally only one time?


I would expect them to use small sizes for almost all the testing.


Yes. There _is_ a need to train LLMs more than once, and training is prohibitively expensive, so you need workarounds such as training on a small subset of data, or a smaller version of the model. We're not yet at the point where a CS student on consumer hardware could afford to do this kind of research.


> We're not yet at the point where a CS student on consumer hardware could afford to do this kind of research.

Okay. But I was saying someone with millions of dollars to spend could do it. And then another poster was arguing that millions of dollars was not enough to be viable because you need lots of repeated runs.

Nobody was saying a student could train one of these models from scratch. The cool potential is for a student to run one, maybe fine tune it.


Here is the upthread comment I was responding to:

> Why would you want to retrain it from scratch every day?

I was explaining why someone might want to retrain it more than once (although not literally every day).


Because things happen every day. If ChatGPT wants to compete with Google, staying up to date with recent events is the minimum bar.


You wouldn't need to re-train from scratch for that, just fine-tune on the new data sources. I don't think constant re-training is the optimal strategy for that use-case anyway. Bing does it by letting the LLM search a more traditional web index to find the information it needs.


Okay but someone has to do the fine tuning. The code has to be updated. Parts of the training have to be redone. All of this has costs. It isn't a "do it once and forget about it" task that it is being touted as in this thread.


>The code has to be updated

I'm pretty sure this is not how an LLM works.

>It isn't a "do it once and forget about it" task that it is being touted as in this thread.

That's neither here, nor there. Training the LLM itself is not a "do it multiple times per day if you want to compete with Google" thing as it has been stated in this subthread.


> > The code has to be updated

> I'm pretty sure this is not how an LLM works.

You can say that about any software. "You can use this software perfectly well without ever updating it." Sure, you can do that, but typically people have lots of reasons to update software. LLM isn't magic in this sense. An LLM does not mysteriously update its own code if you just wish hard enough. If you want to continue the development of the LLM then you need to make changes to the code, just like with any other software.


That's not what the training is about.

Things happen everyday, but languages and words and their associations don't change in any measurable way every day...

This is not like web crawling...


That's not necessary. Look at how a Bing works: it's a LLM which can trigger searches, and then gets fed the search results back to it as part of the prompt.

I wrote about one way to implement that pattern here: https://simonwillison.net/2023/Jan/13/semantic-search-answer...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: