I don't know about GPT4, but GPT3.5 I'd bet is pretty traditional and boring. It's power comes from a really good, properly curated dataset (including the RLHF).
GPT3.5 turbo is much more interesting probably, because they seem to have found out how to make it much more efficient (some kind of distillation?).
GPT4 if I had to make a very rough guess, probably flash attention, 100% of the (useful) internet/books for it's dataset, and highly optimized hyperparameters.
I'd say with GPT4 they probably reached the limit of how big the dataset can be, because they are already using all the data that exists. Thus for GPT5 they'll have to scale in other ways.
To be fair, if the opposite were true, it might not be wise to admit. Saturating available high quality training data is one of the few ways anyone can see OpenAI slowing down.
1. They would already be using everything they can get
2. They would easily be able to explain what they're not using, without giving away sensitive secrets.
I wonder if we saw the same video - or maybe it is just ChatGPT being "great" in the wild? I see one guy asking another guy simple questions and getting weaselwords for an answer.
Right, that's totally it. I came away thinking the interviewer seemed way sharper than the interviewee which is pretty rare. The sheer throughput and speed of interesting questions was incredible. Too bad that many of the answers were not.
> I'd say with GPT4 they probably reached the limit of how big the dataset can be
I’m curious about this too; not just on the dataset size, but also the model size. My hunch is that the rapid improvements of the underlying model by making it bigger/giving it more data will slow, and there’ll be more focus on shrinking the models/other optimisations.
I don't think we're anywhere close to the limit of sheer hardware scalability on this. Returns are diminishing, but if GPT-4 (with its 8+ k context window) is any indication, even those diminishing returns are still very worthwhile.
If anything, I wonder if the actual limit that'll be hit first will be the global manufacturing capacity for relevant hardware. Check out the stock price of NVDA since last October.
According to financial reports they are building a $225 million supercomputer for AI. What we can probably expect is the same dataset with even more compute ran on it.
There is a soft limit due to the computation required; the currently used model architectures are quadratic with respect to context size, so if you want ten times larger context size, that's going to need a hundred times more effort.
GPT3.5 turbo is much more interesting probably, because they seem to have found out how to make it much more efficient (some kind of distillation?).
GPT4 if I had to make a very rough guess, probably flash attention, 100% of the (useful) internet/books for it's dataset, and highly optimized hyperparameters.
I'd say with GPT4 they probably reached the limit of how big the dataset can be, because they are already using all the data that exists. Thus for GPT5 they'll have to scale in other ways.