Related reading: https://dynomight.net/scaling/ In short it seems like virtually...

strangattractor · on April 17, 2023

Good thing he got a bunch of companies to pony up the dough for LLM before he announced they where already over.

tfehring · on April 17, 2023

I don't think LLMs are over [0]. I think we're relatively close to a local optimum in terms of what can be achieved with current algorithms. But I think OpenAI is at least as likely as any other player to create the next paradigm, and that it's at least as likely as likely as any other player to develop the leading models within the next paradigm regardless of who actually publishes the research.

Separately, I think OpenAI's current investors have a >10% chance to hit the 100x cap on their returns. Their current models are already good enough to address lots of real-world problems that people will pay money to solve. So far they've been much more model-focused than product-focused, and by turning that dial toward the product side (as they did with ChatGPT) I think they could generate a lot of revenue relatively quickly.

[0] Except maybe in the sense that future models will be predominantly multimodal and therefore not strictly LLMs. I don't think that's what you're suggesting though.

jacobr1 · on April 17, 2023

It already is relatively trivial to fine-tune generative models for various use cases. Which implies huge gains to be had with targeted applications not just for niche players but also OpenAI and others to either build that fine-tuning into the base system, build ecosystems around it, or just purpose build applications on top.

ehnto · on April 18, 2023

I think it's more exciting if compute stops being the core differentiation, as purpose trained models is exactly where I suspect real value lies.

Especially as a differentiation for a company. If everyone is using ChatGPT, then they're all offering the same thing and I can just as well go to the source and cut out the middleman.

The other fun development to come is well performing self hosted models, and the idea of light weight domain specific interface models that curate responses from bigger generalist models.

ChatGPT is fun but it is very general, it doesn't know about my business nor keep track of it or interface with it. I fully expect to see "Expert Systems" of old come back, but trained on our specific businesses.

brucethemoose2 · on April 17, 2023

Better data is still critical, even if bigger data isn't. The linked article emphasizes this.

tfehring · on April 17, 2023

I'd bet on a 2030 model trained on the same dataset as GPT-4 over GPT-4 trained with perfect-quality data, hands down. If data quality were that critical, practitioners could ignore the Internet and just train on books and scientific papers and only sacrifice <1 order of magnitude of data volume. Granted, that's not a negligible amount of training data to give up, but it places a relatively tight upper bound on the potential gain from improving data quality.

CuriouslyC · on April 18, 2023

It's possible that this effect washes out as data increases, but researchers have shown that for smaller data set sizes average quality has a large impact on model output.

NeuroCoder · on April 17, 2023

So true. There are still plenty of areas where we lack sufficient data to even approach applying this sort of model. How are we going to make similar advances in something like medical informatics where we not only have less data readily available but its much more difficult to acquire more data

riazrizvi · on April 18, 2023

Improvements will not come from collecting more and more samples for current large models, but will come from improvements to algorithms, that also may focus on improving the quality and use of input data.

I don't think there is such a clear separation between algorithms and data as your comment suggests.

no_wizard · on April 17, 2023

All the LC grinding may come in handy after all! /s

What algorithms specifically show the most results upon improvement? Going into this I thought the jump of improvements were really related more advanced automated tuning and result correction, in which it could be done at scale as it were allowing a small team of data scientists to tweak the models until desired results were being achieved.

Are you saying instead, that concrete predictive algorithms need improvement or are we lumping the tuning into this?

tfehring · on April 17, 2023

I think it's unlikely that the first model to be widely considered AGI will be a transformer. Recent improvements to computational efficiency for attention mechanisms [0] seem to improve results a lot, as does RLHF, but neither is a paradigm shift like the introduction of transformers was. That's not to downplay their significance - that class of incremental improvements has driven a massive acceleration in AI capabilities in the last year - but I don't think it's ultimately how we'll get to AGI.

[0] https://hazyresearch.stanford.edu/blog/2023-03-27-long-learn...

sjdbaksbdhe · on April 17, 2023

How did we jump to AGI?

"Sammy A thinks we've made the best engine with the tools at hand" -> "this will never get us out of the solar system"

Sorry to unload on you. It is frustrating to constantly see AGI get brought up needlessly on HN

tfehring · on April 18, 2023

I'm using AGI here as arbitrary major improvement over the current state of the art. But given that OpenAI has the stated goal of creating AGI, I don't think it's a non-sequitur to respond to the parent comment's question

> Are you saying instead, that concrete predictive algorithms need improvement or are we lumping the tuning into this?

in the context of what's needed to get to AGI - just as if NASA built an engine we'd talk about its effectiveness in the context of space flight.

uoaei · on April 17, 2023

Traditional CS may have something to do with slightly improving the performance by allowing more training for the same compute, but it won't be an order of magnitude or more. The improvements to be gained will be found more in statistics than CS per se.

jacobr1 · on April 17, 2023

I'm not sure. Methods like Chinchilla and Quantization have been able to reduce compute by more than an order of magnitude. There might very well be a few more levels of optimizations within the same statistical paradigm.

junipertea · on April 17, 2023

We need more data efficient neural network architectures. Transformers work exceptionally well because they allow us to just dump more data into it, but ultimately we want to learn advanced behavior without having to feed it Shakespeare

uoaei · on April 17, 2023

Inductive Bias Is All You Need

goldenManatee · on April 17, 2023

bubble sort /s