In short it seems like virtually all of the improvement in future AI models will come from better algorithms, with bigger and better data a distant second, and more parameters a distant third.
Of course, this claim is itself internally inconsistent in that it assumes that new algorithms won't alter the returns to scale from more data or parameters. Maybe a more precise set of claims would be (1) we're relatively close to the fundamental limits of transformers, i.e., we won't see another GPT-2-to-GPT-4-level jump with current algorithms; (2) almost all of the incremental improvements to transformers will require bigger or better-quality data (but won't necessarily require more parameters); and (3) all of this is specific to current models and goes out the window as soon as a non-transformer-based generative model approaches GPT-4 performance using a similar or lesser amount of compute.
I don't think LLMs are over [0]. I think we're relatively close to a local optimum in terms of what can be achieved with current algorithms. But I think OpenAI is at least as likely as any other player to create the next paradigm, and that it's at least as likely as likely as any other player to develop the leading models within the next paradigm regardless of who actually publishes the research.
Separately, I think OpenAI's current investors have a >10% chance to hit the 100x cap on their returns. Their current models are already good enough to address lots of real-world problems that people will pay money to solve. So far they've been much more model-focused than product-focused, and by turning that dial toward the product side (as they did with ChatGPT) I think they could generate a lot of revenue relatively quickly.
[0] Except maybe in the sense that future models will be predominantly multimodal and therefore not strictly LLMs. I don't think that's what you're suggesting though.
It already is relatively trivial to fine-tune generative models for various use cases. Which implies huge gains to be had with targeted applications not just for niche players but also OpenAI and others to either build that fine-tuning into the base system, build ecosystems around it, or just purpose build applications on top.
I think it's more exciting if compute stops being the core differentiation, as purpose trained models is exactly where I suspect real value lies.
Especially as a differentiation for a company. If everyone is using ChatGPT, then they're all offering the same thing and I can just as well go to the source and cut out the middleman.
The other fun development to come is well performing self hosted models, and the idea of light weight domain specific interface models that curate responses from bigger generalist models.
ChatGPT is fun but it is very general, it doesn't know about my business nor keep track of it or interface with it. I fully expect to see "Expert Systems" of old come back, but trained on our specific businesses.
I'd bet on a 2030 model trained on the same dataset as GPT-4 over GPT-4 trained with perfect-quality data, hands down. If data quality were that critical, practitioners could ignore the Internet and just train on books and scientific papers and only sacrifice <1 order of magnitude of data volume. Granted, that's not a negligible amount of training data to give up, but it places a relatively tight upper bound on the potential gain from improving data quality.
It's possible that this effect washes out as data increases, but researchers have shown that for smaller data set sizes average quality has a large impact on model output.
So true. There are still plenty of areas where we lack sufficient data to even approach applying this sort of model. How are we going to make similar advances in something like medical informatics where we not only have less data readily available but its much more difficult to acquire more data
Improvements will not come from collecting more and more samples for current large models, but will come from improvements to algorithms, that also may focus on improving the quality and use of input data.
I don't think there is such a clear separation between algorithms and data as your comment suggests.
All the LC grinding may come in handy after all! /s
What algorithms specifically show the most results upon improvement? Going into this I thought the jump of improvements were really related more advanced automated tuning and result correction, in which it could be done at scale as it were allowing a small team of data scientists to tweak the models until desired results were being achieved.
Are you saying instead, that concrete predictive algorithms need improvement or are we lumping the tuning into this?
I think it's unlikely that the first model to be widely considered AGI will be a transformer. Recent improvements to computational efficiency for attention mechanisms [0] seem to improve results a lot, as does RLHF, but neither is a paradigm shift like the introduction of transformers was. That's not to downplay their significance - that class of incremental improvements has driven a massive acceleration in AI capabilities in the last year - but I don't think it's ultimately how we'll get to AGI.
I'm using AGI here as arbitrary major improvement over the current state of the art. But given that OpenAI has the stated goal of creating AGI, I don't think it's a non-sequitur to respond to the parent comment's question
> Are you saying instead, that concrete predictive algorithms need improvement or are we lumping the tuning into this?
in the context of what's needed to get to AGI - just as if NASA built an engine we'd talk about its effectiveness in the context of space flight.
Traditional CS may have something to do with slightly improving the performance by allowing more training for the same compute, but it won't be an order of magnitude or more. The improvements to be gained will be found more in statistics than CS per se.
I'm not sure. Methods like Chinchilla and Quantization have been able to reduce compute by more than an order of magnitude. There might very well be a few more levels of optimizations within the same statistical paradigm.
We need more data efficient neural network architectures. Transformers work exceptionally well because they allow us to just dump more data into it, but ultimately we want to learn advanced behavior without having to feed it Shakespeare
In short it seems like virtually all of the improvement in future AI models will come from better algorithms, with bigger and better data a distant second, and more parameters a distant third.
Of course, this claim is itself internally inconsistent in that it assumes that new algorithms won't alter the returns to scale from more data or parameters. Maybe a more precise set of claims would be (1) we're relatively close to the fundamental limits of transformers, i.e., we won't see another GPT-2-to-GPT-4-level jump with current algorithms; (2) almost all of the incremental improvements to transformers will require bigger or better-quality data (but won't necessarily require more parameters); and (3) all of this is specific to current models and goes out the window as soon as a non-transformer-based generative model approaches GPT-4 performance using a similar or lesser amount of compute.