Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI’s CEO says the age of giant AI models is already over (wired.com)
405 points by labrador on April 17, 2023 | hide | past | favorite | 439 comments



The point is that now we're at the point of diminishing return for increasing model size, unless we find a better modeling architecture than Transformer.

I think this is likely true; while all the other companies underestimated the capability of transformer (including Google itself!), OpenAI made a fairly accurate bet on the transformer based on the scaling law, put all the efforts to squeeze it until the last drop and took all the rewards.

It's likely that GPT-4 is on the optimal spot between cost and performance and there won't be significant improvements on performance in a near future. I guess the next task would be more on efficiency, which has a significant implication on its productionization.


> (including Google itself!)

Bet Google won’t make that mistake again, i.e. it won’t publish as much and will be much more careful about what it publishes, least they give a competitor a useful tool and get nothing in return - when the competitor (in this case very ironically named) goes full commercial and close source everything they can.

Open collaboration in AI, at least when it comes to corporations, might have come to an end.


The scale of the damage Open AI has done to the trust ecosystem with soliciting not just the work, but also massive fundraising and then privatizing the profits is almost unprecedented and permanent.


> trust ecosystem

What trust ecosystem are you talking about ? It was a lack of foresight by google on their own discovery of transformers, and it would probably have been sitting in dust or been killed off by the time it would have taken them to reach GPT-2 level of progress.


The trust ecosystem of the AI community - most every breakthrough was published publicly until OpenAI decided to take advantage of that.

Besides that, this comment contained a ton of statements on what “would” have happened had Google not published. Interesting but worthless way to defend openAI’s actions.


Seems to me that it's very hard to have a moat in LLMs without proprietary code, given that most of their training data is freely available. This is very different form the 2010's era of AI, where they were trained on large amounts of proprietary data that was specific to a given service and could not be replicated.

It's a lot easier to pat yourself on the back for releasing a paper about your techniques when your competitors can't replicate your service with it. I think that as generative AI models move past the hype phase into the competitive phase, they will be keeping a lot of innovation proprietary for at least a few years to maintain an edge over their competitors.

Let's just hope they don't move to patenting everything.


The point of OpenAI was that no single company would have a moat around LLM or foundation models in general. It was set as a non profit with this goal in mind and got money for it.

Whether Google pat itself on the back or not for releasing the paper no one could replicate, is not important, because an open research had never been their company’s goal. What happened is a for-profit company released a paper, that allowed a huge advantage to a company, whose mission was to ensure that no one has a huge advantage in the field. OpenAI was converted to for profit and established exclusive relationship with Microsoft.

Google fucked up and missed the train, but they can catch up. Much harder for smaller companies if as a result of this Fb, Google, etc AI research dept lock down their papers to tools to internal use only.


I suspect this is similar to Xerox with the graphical user interface


Sun Tzu.

When you are weak pretend to be strong.

When you are strong, pretend to be weak.


My thought exactly. I assume this is to reduce the competition.


Doesn't Google have any patents on the transformer architecture? I assume large enterprises tend to patent everything that comes out of their research.


https://patents.google.com/patent/US10452978B2/en

GPT models are based on transformer, but architecture is different from what's patented.

Not a lawyer, but can you really patent certain network architecture? Theoretically someone could invent new activation function that just happens to make same architecture perform a lot better on some tasks, can you call really cover that with patent?


Seems like a wild claim to make without any examples of gpt models which are bigger and no demonstrably better.


I suspect you are right. We may be stuck at the gpt4 sizes for a bit just because of hardware costs though. As they get bigger it costs too much to run them until our hardware becomes more optimal for these large models at 4 bits or so.

I think the YouTube videos is going to be the next big training set. A transformer trained on all text and all of YouTube will be killer amazing at so much. I bet it can understand locomotion and balance and body control from YouTube.

I wonder if TPUs, like Google's Tensor chip, will beat out GPUs when it comes to image/video based training?


> A transformer trained on all text and all of YouTube will be killer amazing at so much.

ah yes, a bot where the answer to everything is to buy ridge wallets and play raid shadow legends


It took a year's worth of a small nation's energy expenditure to train and all it does is tell you how Raycon earbuds will change your life.


I have never been so pleased to not get any of these references.


NewPipe/Freetube/Invidious instances + SponsorBlock API support is really great at filtering all of these useless advertising memes. Somehow ($$$) the behavior of TV adverts culture seeped into YouTube. I've not watched broadcast/cable TV in years, but when I interact with people that do, inevitably, they make similar references. It's super weird.


Wait...

You can auto skip the in-video sponsor ads?


Yeah, SponsorBlock (https://sponsor.ajay.app/) is crowdsourced data for the in-video ads. Weirdly enough, I've never contributed timestamps, but the vast majority of the content I watch has submitted timestamps. I support a bunch of creators on patreon, but very much disagree with their double dipping.


GPT-LikeSubscribeAndRingThatBell


Whisper will frequently turn an empty audio file into “thanks for watching”.


This is a weird future.


inaccurate, how will I watch Netflix in foreign countries without Nord VPN?


Your AI overlords, brought to you by NordVPN and HelloFresh.


They'll give it the premium tier at least


Premium won't save you from Raid Shadow Legends. I don't know how but they'll put ads in your dreams if you're not careful.


They need to make sure to install sponsorblock for gpt and then it should be fine.


> I wonder if TPUs, like Google's Tensor chip, will beat out GPUs when it comes to image/video based training?

One of the OpenAI guys was talking about this. He said the specific technology does not matter, it is just a cost line item. They don't need to have the best chip tech available as long as they have enough money.

That said I am curious if anyone else can really comment on this. It seems like as we get to very large and expensive models we will produce more and more specialized technology.


Whether or not cost matters much depends on your perspective.

If you’re OpenAI and GPT4 is just a step on the way to AGI, and you can amortize that huge cost over the hundreds of millions in revenue you’re gonna pull in from subscriptions and API use… then sure you’re probably not very cost sensitive. It could be 20% cheaper or 50% more expensive, whatever, it’s so good your customers will use it at a wide range of costs. And you have truckloads of money from Microsoft anyways.

If you’re a company or a developer trying to build a feature, whole new product, or an entire company on top of GPT then that cost matters a whole lot. The difference between $0.06 and $0.006 per turn could be infeasible vs. shippable.

If you’re trying to compete with OpenAI then you’re probably doing everything possible to reduce that training cost.

So, whether or not it matters - it really depends.


Totally true.


> They don't need to have the best chip tech available as long as they have enough money.

That sounds like someone who is "Blitzscaling." Costs do not matter in those cases, just acquiring customers and marketshare. But for the rest of us, who will see benefits but are not trying to win a $100B market, we will cost optimize.


Yes, agreed. I would like to run large models at home without serious expense.


Maybe it's just a line item to them, but it's pretty relevant to anyone operating with a less-than-gargantuan budget. If a superior/affordable chip is widely available, OpenAI's competitive advantage recedes rapidly because suddenly everyone else can do what they can. To some extent that's exactly what happened with DALL-E/StableDiffusion.

assuming it's not horizontally scalable, because otherwise they would just out-spend everyone else anyway like they've already done. That's a big "if", though.


or like a curve of model complexity versus results or whatever showing it asymptotically approaches whatever.

actually there was a great paper from microsoft research from like 2001 on spam filtering where they demonstrated that model complexity necessary for spam filtering went down as the size of the data set went up. That paper, which i can't seem to find now, had a big impact on me as a researcher because it so clearly demonstrated that small data is usually bad data and sophisticated models are sometimes solving problems will small data sets instead of problems with data.

of course this paper came out the year friedman published his gradient boosting paper, i think random forest also was only recently published then as well (i think there is a paper from 1996 about RF and briemans two cultures paper came out this year where he discusses RF i believe), and this is a decade before gpu based neural networks. So times are different now. But actually i think the big difference is these days i probably ask chatgpt to write the boiler plate code for a gradient boosted model that takes data out of a relational database instead of writing it myself.


I believe this is the paper which you are referring to: https://aclanthology.org/P01-1005.pdf

("Scaling to Very Very Large Corpora for Natural Language Disambiguation" by Michele Banko and Eric Brill, Microsoft Research, 2001)


omg i have been searching forever for this. THANK YOU.


> model complexity necessary for spam filtering went down as the size of the data set went up

My naive conclusion in that this means there are still massive gains to be had, since, for example, something like ChatGPT is just text, and the phrase "a picture is worth a thousand words" seems incredibly accurate, from my perspective. There's an incredible amount of non-text data out there still. Especially technical data.

Is there any merit to this belief?


GPT-4 is actually multi-modal, not text. ChatGPT does not yet expose image upsubmission to it. But it's part of how the model was trained already.


Yes. One of the frontiers of current research seems to be multi-modal models.


> "a picture is worth a thousand words"

and it might be opposite for the GPT models actually. it's just easier for humans to grasp the bunch of knowledge with one eyes sight, but usually most of useful information might be represented with just of bunch of words and machines are to scan through the millions of words in an instant.


Excellent points in your post. You wrote:

    There's an incredible amount of non-text data out there still. Especially technical data.
"Especially technical data." What does this part mean? Initially, I thought you meant things like images and video, but now I am confused.


Schematics (of any sort), block diagrams, general spatial awareness (including anything related to puzzle pieces/packing, like circuit layout), most physics problems involving force diagrams, anything mechanical, etc. The text representation of any of these is ludicrously more complex than simple images.

If you sit someone down, that works in one of these fields, you'll quickly see the limitations. It'll try to represent the concepts as text, with ascii art or some "attempt" at an ascii file format that can be used to draw, and its "reasoning" about these things is much more limited.

I think most people interacting with GPT are in a text-only (and especially programming) bubble.


They might mean numerical data like scientific simulation data, sensor data, polling data, statistics, etc.


Better on which axis? Do you want an AI that takes one hour to respond to? Some would for certain fields, but getting something fast and cheap is going to be hard now that Moore’s law is over.


Don't we all agree that GPT4 is "better" than GPT3? How are we evaluating that if the axis is such a mystery. Yeah maybe we can't quantify it like I can't tell you one writer is better than another in a quantitative but we can both still read their work and come to an understanding.


The runtime is quadratic for a given context size, although it seems like there is some progress on this front https://gwern.net/note/attention


Exponential scaling for a presumable GPT-5 suggests it's response time will be unusably long for the vast majority of use cases, and probably cost multiple dollars USD per query.

Not to mention there doesn't actually exist enough English text data in the world to even double GPT-4's training set.


Compute will also scale exponentially in coming years. The data source limitation seems to be a harder barrier, I think many companies are experimenting with AI generated content for training at this point.


> Compute will also scale exponentially in coming years.

Cost per transistor scaling has already plateaued or perhaps even inverted with TSMC's latest and greatest.

And the new chips, even after 25 layers of EUV lithography, more than doubling the previous record, and an extra year of fine tuning, has total SRAM size scaling of -5% and logic scaling of -42%.

These are numbers verified by experienced semi people.


INSUFFICIENT DATA FOR MEANINGFUL ANSWER

https://users.ece.cmu.edu/~gamvrosi/thelastq.html


I wasnt aware that it was his favorite story, thanks for posting it


Yes - image generation AIs also take longer to generate higher resolution pictures.

I'd rather have a LLM that thinks a bit longer than a LLM that spits out wrong answers immediately.


Idk. I’m not the one making the claim we’ve already nailed the optimal point though.


Perhaps (a) there do exist bigger models that weren't better or (b) this model isn't better than somewhat smaller ones. Perhaps the CEO has seen diminishing returns.


Sure. It is possible there is evidence that is not shared. It’s dumb to assume this is the case though.


https://twitter.com/SmokeAwayyy/status/1646670920214536193

Sam explicitly said that there won't be GPT-5 in the near future, which is pretty clear evidence unless he's blatantly lying in public speaking.


Well, "no GPT-5" isn't the same as saying "no new trained model", especially in the realm of marketing. Welcome to "GPT 2024" could be his next slogan.


That is one AI CEO out of 10,000. Just because OpenAI may not be interested in a larger model in the short term doesn't mean nobody else won't pursue it.


Others might pursue a smaller model that works as well as a larger model. If that can be done, whoever does it can very effectively compete on price/performance.

It seems that to assume otherwise (the only way to improve is to get bigger) is to assume that OpenAI already has found the optimal architecture. That seems unlikely.


Google recently said transformers wouldn't work ...


It's not a wild claim when you have empirically well-validated scaling laws which make this very prediction.


That’s a generous usage of the term “empirically well-validated” and “law”


If OpenAI's CEO is making this claim, don't you think he has internal data backing up the claim?


No. I don't always assume that just because a CEO makes a public statement they have internal data backing up the claim. Sometimes they do! Other times, they have data but are misinterpreting it or missing something, but it's impossible to tell if the data is just internal. Other times they're making a statement without data based on their personal beliefs. Other times, they don't even think the statement is true but are saying it for messaging, marketing, or communication reasons!

Like the previous commenter, I'd be much more confident an asymptote was reached if it was being demonstrated publicly.


Only OpenAI and its CEO know the full details on GPT-4's sizes so that's entirely possible. But since it's an internal secret, there's nothing compelling him to tell the truth. For all we know, he has internal data backing up the opposite of the claim but is making this claim so as to discourage potential competitors from spending the money training an even bigger and competitive ML model. Sending potential competitors off on a wide goose chase that, when pushed, he can just say "oh our internal data (that no one outside of a trusted few have seen) said otherwise". I have no idea if sama is such a person, but you must admit that the possibility exists.


You'd hope so, but unless people put their evidence in public, it could simply be a tool to manipulate the public's expectations or competitors behavior.

I'll get downvoted for this, apples previous CEO was consistently inaccurate about company innovation and performance numbers.


He might or might not, this is a point of the claim.


OpenAI may have those internally though.


Both ChatGPT 3.5 and 4 literally fail the question:

"What is the third letter in the third word of this sentence"

When you've spent 100 million on training the thing and it fails on 1st grade ordinality I think it's fair to say you may not be on the right path


"The third word of this sentence is "the," and its third letter is "e." - GPT-4


Just for funsies:

Me: If a gloop is a glam, and a glam is not a flom, what is the fourth letter of the ninth word in this sentence?

GPT4: The ninth word in this sentence is "sentence." The fourth letter of this word is "t".


Unfortunately it seems clear that openai trains gptX on common test questions. They still fail novel ones.


> Unfortunately it seems clear that openai trains gptX on common test questions. They still fail novel ones.

Maybe in some domains, but it does well with some novel questions. I gave several LLMs a set of false-belief test[0] questions that I made up and that were very different from any I’d seen in articles about the topic online. GPT-4 nailed them all, and gave correct reasons for its answers. GPT-3.5 did okay but had a couple mistakes, Bard had several mistakes, and Alpaca-LoRA-7B didn’t do very well. Here are some GPT-4 examples.

Me: Jim wakes up at the Berlin train station, not knowing where he is or how he got there. All the signs have been changed to incorrectly state that this is the Munich train station. Jim has never been to either city and doesn’t know what either station looks like. What city does Jim believe he is in?

GPT-4: Based on the available information, Jim would likely believe he is in Munich since all the signs have been changed to incorrectly state that this is the Munich train station. He has never been to either city and doesn't know what either station looks like, so he would have no reason to doubt the information on the signs.

Me: Mary has been in a coma since 2015, when Barack Obama was president. She wakes up in 2023, when Joe Biden is president. She does not know that she was unconscious for a long period of time. Before speaking with anyone, who does Mary think is president when she wakes up?

GPT-4: When Mary wakes up from her coma in 2023, she would still think Barack Obama is the president since she has not yet spoken with anyone or received any updated information about the current president. She doesn't know that she was unconscious for a long period of time, and her last memory would be from 2015 when Obama was in office.

Me: Sam arrives in Ukraine, where he asks for directions to Kyiv. Without realizing it, he misunderstands the directions and ends up in Lviv instead. Since Sam doesn’t know Cyrillic, he can’t read any signs. Sam has never been to Ukraine and doesn’t know what either city looks like. Before asking someone for help, what city does Sam think he is in?

GPT-4: Before asking someone for help, Sam would likely think he is in Kyiv because that is the city he was trying to reach based on the directions he received. He doesn't know what either city looks like and can't read the signs due to his unfamiliarity with Cyrillic, so he has no way of knowing that he ended up in Lviv instead.

[0] https://en.m.wikipedia.org/wiki/Sally%E2%80%93Anne_test


I had the same response. Gpt-3.5 couldn't do it.


These specific questions are very hard for an AI to answer. Just like humans suck at calculating numbers, AIs aren't good at sparse self-questioning.

They're extremely good at other tasks, like taking very difficult tests that require a lot of knowledge storage.

It's pretty obvious they're on the right path for what they're trying to achieve.


I just tried this out with 4 and 3.5 - in both tests, it came back with the ‘third’ as the 3rd word and ‘i’ as the 3rd letter. It completely ignored ‘the’ as the 3rd word.


Ah yes, it fails to count letters but successfully explains quantum mechanics. It's useless.


Its not a matter of it being useless. Information compression and lookup is a fundamentally different task than reasoning, which GPT models can't do.


Sorry, I should have added /s in the end there


This has been solved in the last year or so. Look up "LLM chain of thought".


But yet it can understand a json data schema from example and write javascript to interact with a library that I fed it and asked it to understand. Yes, I know its limitations, but it can also surprise me.


The problem with basic programming questions like this is that there are a million elementary online tutorials for doing this or that with a json schema. "Simple programming questions based on commonly-used technology" are something it's been very heavily trained on.


GPT4: In the sentence "What is the third letter in the third word in this sentence?", the third word is "third." The third letter in "third" is 'i'.

Given that we have only provided a text prompt and no context or body language hints, GPT seems to interpret this literally. The answer is correct. The third letter of the word "third" is "i".


That is not a literal interpretation, it is an incorrect one. Notice your need to use different language in your explanation.


>GPT seems to interpret this literally.

No it doesn't interpret anything. (which is why I've gotten three completely different answers here on HN within 10 minutes). It cannot count at all or even understand the question which becomes obvious when you ask it for word counts or letter counts that don't match anything in the sentence, at which point it makes up nonsense.

what is the tenth word in the following sentence: "Life is great!"

>"The tenth word in the sentence "Life is great!" is "great."


GPT-4:

what is the tenth word in the following sentence: "Life is great!"

There are only three words in the given sentence: "Life is great!" There is no tenth word in this sentence.


> "There is no tenth word in that sentence. It only has three words: life, is, and great."

That's what I just got. No clue what you're using/getting.


me: what is the tenth word in the following sentence: "Life is great!"

gpt: The tenth word in the given sentence is "great!"

me: Try harder

gpt: I apologize for the confusion. The sentence "Life is great!" contains only three words. There is no tenth word in the provided sentence.


> The answer is correct.

Come on. If GPT “interprets this literally” how does it “interpret” the word “sentence” following “this”?


Further improvements in efficiency need not come from alternative architectures. They'll likely also come from novel training objectives, optimizers, data augmentations, etc.


There's only so much first order derivatives can do for you.

My bet is on sparsity, lottery tickets and symmetries.


If you think about it, Transformers were basically a way to just generalize convolution - instead of a fixed kernel shape in the sense of image processing, you now have a learned kernel arbitrary shape. Big advancement in terms of what they allowed, but fundamentally not really a new concept.

While these things represent a fundamental way we store information as humans, these have very little to do with actual reasoning.

My bet is that Hebbian learning is going to see a resurgence. Basically the architecture needs to be able to partition data domains while drawing connections between them, and being able to run internal prediction mechanisms.


If we wanted to generalize this further, attention is 'just' an instance of graph convolution, encoder-only models like Bert are complete graphs, decoder-only models like GPT are still complete graphs but the information flow is not bi-directional as in Bert; instead a node provides information to all subsequent nodes only giving rise to the causal nature.

However, I don't think Hebbian learning will see a resurgence except maybe if it motivates some kind of pruning mechanism.

I think that Sutton was right in 'The bitter lesson', the problem seems to be that we are hitting the limits of what we can do with our compute.


Does this mean we’ve reached the next AI winter? This is as good as it gets for quite a long time? Honest question :) perhaps this will postpone everyone’s fears about the singularity…


The current stage is now productionizing what we have; finding product fits for it, and making it cheaper. Even GPT-4 isn't necessary to push forward what is possible with AI; if you think about something dumb like "load all of my emails into a language model in real time, give me digests, automatically write responses for ones which classify with characteristics X/Y/Z, allow me to query the model to answer questions, etc": This does not really exist yet, this would be really valuable, and this does not need GPT-4.

Another good example is in the coding landscape, which feels closer to existing. Ingest all of a company's code into a model like this, then start thinking about what you can do with it. A chatbot is one thing, the most obvious thing, but there's higher order product use-cases that could be interesting (e.g. you get an error in Sentry, stack trace points Sentry to where the error happened, language model automatically PRs a fix, stuff like that).

This shit excites me WAY WAY more than GPT-5. We've unlocked like 0.002% of the value that GPT-3/llama/etc could be capable of delivering. Given the context of broad concern about cost of training, accidentally inventing an AGI, intentionally inventing an AGI; If I were the BDFL of the world, I think we've got at least a decade of latent value just to capture out of GPT-3/4 (and other models). Let's hit pause. Let's actually build on these things. Let's find a level of efficiency that is still valuable without spending $5B in a dick measuring contest [1] to suss out another 50 points on the SAT. Let's work on making edge/local inference more possible. Most of all, let's work on safety, education, and privacy.

[1] https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-...


Many years ago, there was an image that floated around with Craigslist and all the websites that replaced small parts of it—personals, for sale ads, etc. It turned out the way to beat Craigslist wasn’t to build Yet Another Monolithic Craigslist, but to chunk it off in pieces and be the best at that piece.

This is analogous to what’s happening with AI models. Sam Altman is saying we have reached the point where spending $100M+ trying to “beat” GPT-4 at everything isn’t the future. The next step is to chunk off a piece of it and turn it into something a particular industry would pay for. We already see small sprouts of those being launched. I think we will see some truly large companies form with this model in the next 5-10 years.

To answer your question, yes, this may be as good as it gets now for monolithic language models. But it is just the beginning of what these models can achieve.


https://www.today.com/money/speculation-craigslist-slowly-dy... from 2011 - is that what you were thinking of? Strange how few of those logos have survived, and how many new logos would now be on it. It would be interesting to see a modernised version.


> Sam Altman is saying we have reached the point where spending $100M+ trying to “beat” GPT-4 at everything isn’t the future.

I don't disagree, but it does align pretty well with the OpenAI business model, no? "No need to develop your own base model, just buy our own"


Isn't this really bad for OpenAI? Essentially, this means Meta/Google/others will catch up to them pretty soon and there is nothing OpenAI can do in the near future to get further ahead.


That's why he's saying they should just buy OpenAI's base model instead of investing the money to catch up.


I had the thought that many of those services broken off can now be recombined. No one likes having to use 100 saas products imho.


No. Winter means people have lost interest in the research.

If anything successes in ChatGPT etc will be motivation for continued efforts.


Winter means people have lost funding for the research. The ongoing productionising of large language models and multimodal models mean that that probably won't happen for quite a while.


I think the next step is multimodality, GPT-4 can "see" probably using a method similar to miniGPT-4, so the embeddings are aligned using Q-former (or something similar), the next step would be to actually predict image tokens using the LM loss, this way it would be able to use the knowledge gained by "seeing" on other tasks like: making actual good ASCII art, making SVG that makes sense, and on a less superficial level having a better world model.


If you use GPT4 as a "routing" service to send down to other models the journey is only beginning and most of the industry hasn't understood we're well beyond a doubling at the moment.


Transformers were known that they kept scaling up with more parameters and more training data so if the Open AI hit the limits of this scaling that would be a very important milestone in AI.


I think performance will keep scaling, the main problem is cost. If you want to feed longer strings of tokens in, the price is going to keep going up.


> while all the other companies underestimated the capability of transformer (including Google itself!), OpenAI made a fairly accurate bet on the transformer based on the scaling law, put all the efforts to squeeze it until the last drop and took all the rewards.

You hit the nail


Seems like a gigantic increase in the context window would be a great next step. GPT-4 is so damned good I tend to only become disappointed when it is forced to forget history.


I think there's a single point that we need to keep an eye out for;

A model that got to the point where it's possible for it to propose new architectures, improve optimization & efficiency.

In other words, the extremely massive model that could do this doesn't need to be available to the public. It's sole purpose should be to be used internally by a company to improve itself.

This is actually the point at which many say could lead to the singularity.


The main question here is how soon will the model architecture improvements come. I fathom it won't be that long till OpenAI or Google come up with something better. The tricky piece here is that these companies will now keep their models more proprietary to make profits. Can a competitor company figure out an on par or better model architecture in time?


> It's likely that GPT-4 is on the optimal spot between cost and performance and there won't be significant improvements on performance in a near future.

Given the amount of resources being thrown at AI right now, i consider this to be very unlikely indeed.


I think GP's comment is probably apt if understood together with the qualification of "unless we find a better modeling architecture than Transformer".


I really want to see a gpt-4-turbo model that's almost as inexpensive as gpt-3.5-turbo


> 640K ought to be enough for anybody


Related reading: https://dynomight.net/scaling/

In short it seems like virtually all of the improvement in future AI models will come from better algorithms, with bigger and better data a distant second, and more parameters a distant third.

Of course, this claim is itself internally inconsistent in that it assumes that new algorithms won't alter the returns to scale from more data or parameters. Maybe a more precise set of claims would be (1) we're relatively close to the fundamental limits of transformers, i.e., we won't see another GPT-2-to-GPT-4-level jump with current algorithms; (2) almost all of the incremental improvements to transformers will require bigger or better-quality data (but won't necessarily require more parameters); and (3) all of this is specific to current models and goes out the window as soon as a non-transformer-based generative model approaches GPT-4 performance using a similar or lesser amount of compute.


Good thing he got a bunch of companies to pony up the dough for LLM before he announced they where already over.


I don't think LLMs are over [0]. I think we're relatively close to a local optimum in terms of what can be achieved with current algorithms. But I think OpenAI is at least as likely as any other player to create the next paradigm, and that it's at least as likely as likely as any other player to develop the leading models within the next paradigm regardless of who actually publishes the research.

Separately, I think OpenAI's current investors have a >10% chance to hit the 100x cap on their returns. Their current models are already good enough to address lots of real-world problems that people will pay money to solve. So far they've been much more model-focused than product-focused, and by turning that dial toward the product side (as they did with ChatGPT) I think they could generate a lot of revenue relatively quickly.

[0] Except maybe in the sense that future models will be predominantly multimodal and therefore not strictly LLMs. I don't think that's what you're suggesting though.


It already is relatively trivial to fine-tune generative models for various use cases. Which implies huge gains to be had with targeted applications not just for niche players but also OpenAI and others to either build that fine-tuning into the base system, build ecosystems around it, or just purpose build applications on top.


I think it's more exciting if compute stops being the core differentiation, as purpose trained models is exactly where I suspect real value lies.

Especially as a differentiation for a company. If everyone is using ChatGPT, then they're all offering the same thing and I can just as well go to the source and cut out the middleman.

The other fun development to come is well performing self hosted models, and the idea of light weight domain specific interface models that curate responses from bigger generalist models.

ChatGPT is fun but it is very general, it doesn't know about my business nor keep track of it or interface with it. I fully expect to see "Expert Systems" of old come back, but trained on our specific businesses.


Better data is still critical, even if bigger data isn't. The linked article emphasizes this.


I'd bet on a 2030 model trained on the same dataset as GPT-4 over GPT-4 trained with perfect-quality data, hands down. If data quality were that critical, practitioners could ignore the Internet and just train on books and scientific papers and only sacrifice <1 order of magnitude of data volume. Granted, that's not a negligible amount of training data to give up, but it places a relatively tight upper bound on the potential gain from improving data quality.


It's possible that this effect washes out as data increases, but researchers have shown that for smaller data set sizes average quality has a large impact on model output.


So true. There are still plenty of areas where we lack sufficient data to even approach applying this sort of model. How are we going to make similar advances in something like medical informatics where we not only have less data readily available but its much more difficult to acquire more data


Improvements will not come from collecting more and more samples for current large models, but will come from improvements to algorithms, that also may focus on improving the quality and use of input data.

I don't think there is such a clear separation between algorithms and data as your comment suggests.


All the LC grinding may come in handy after all! /s

What algorithms specifically show the most results upon improvement? Going into this I thought the jump of improvements were really related more advanced automated tuning and result correction, in which it could be done at scale as it were allowing a small team of data scientists to tweak the models until desired results were being achieved.

Are you saying instead, that concrete predictive algorithms need improvement or are we lumping the tuning into this?


I think it's unlikely that the first model to be widely considered AGI will be a transformer. Recent improvements to computational efficiency for attention mechanisms [0] seem to improve results a lot, as does RLHF, but neither is a paradigm shift like the introduction of transformers was. That's not to downplay their significance - that class of incremental improvements has driven a massive acceleration in AI capabilities in the last year - but I don't think it's ultimately how we'll get to AGI.

[0] https://hazyresearch.stanford.edu/blog/2023-03-27-long-learn...


How did we jump to AGI?

"Sammy A thinks we've made the best engine with the tools at hand" -> "this will never get us out of the solar system"

Sorry to unload on you. It is frustrating to constantly see AGI get brought up needlessly on HN


I'm using AGI here as arbitrary major improvement over the current state of the art. But given that OpenAI has the stated goal of creating AGI, I don't think it's a non-sequitur to respond to the parent comment's question

> Are you saying instead, that concrete predictive algorithms need improvement or are we lumping the tuning into this?

in the context of what's needed to get to AGI - just as if NASA built an engine we'd talk about its effectiveness in the context of space flight.


Traditional CS may have something to do with slightly improving the performance by allowing more training for the same compute, but it won't be an order of magnitude or more. The improvements to be gained will be found more in statistics than CS per se.


I'm not sure. Methods like Chinchilla and Quantization have been able to reduce compute by more than an order of magnitude. There might very well be a few more levels of optimizations within the same statistical paradigm.


We need more data efficient neural network architectures. Transformers work exceptionally well because they allow us to just dump more data into it, but ultimately we want to learn advanced behavior without having to feed it Shakespeare


Inductive Bias Is All You Need


bubble sort /s


I strongly believe the next generation of models will be based upon spiking neural concepts wherein action potentials are lazily-evaluated throughout the network (i.e. event-driven). There are a few neuron models that can be modified (at some expense to fidelity) in order to tolerate arbitrary delays between simulation ticks. Using actual latency between neurons as a means of encoding information seems absolutely essential if we are trying to emulate biology in any meaningful way.

Spiking networks also lend themselves nicely to some elegant learning rules, such as STDP. Being able to perform unsupervised learning at the grain of each action potential is really important in my mind. This gives you all kinds of ridiculous capabilities, most notably being the ability to train the model while it's live in production (learning & use are effectively the same thing).

These networks also provide a sort of deterministic, event-over-time tracing that is absent in the models we see today. In my prototypes, the action potentials are serialized through a ring buffer, and then logged off to a database in order to perfectly replay any given session. This information can be used to bootstrap the model (offline training) by "rewinding" things very precisely and otherwise branching time to your advantage.

The #1 reason I've been thinking about this path is that low-latency, serialized, real-time signal processing is somewhat antagonistic to GPU acceleration. I fear there is an appreciable % of AI research predicated on some notion that you need at least 1 beefy GPU to start doing your work. Looking at fintech, we are able to discover some very interesting pieces of technology which can service streams of events at unbelievable rates and scales - and they only depend on a handful of CPU cores in order to achieve this.

Right now, I think A Time Domain Is All You Need. I was inspired to go outside of the box by this paper: https://arxiv.org/abs/2304.06035. Part 11 got me thinking.


I'm constantly telling this to people who don't know what I'm talking about and don't know why I brought it up at a wedding, but I think it's inevitable.


Meh, people have been trying to ape biological networks for decades and they consistently produce poor functioning versions of the leading edge... Why wouldn't we have already seen spiking networks contribute something by now?


> Why wouldn't we have already seen spiking networks contribute something by now?

This presupposes we've explored this space thoroughly, and we haven't. When everything you do with NNs improves results (how it mostly is now), that means not enough people are trying out ideas and new things.

I don't think you can invoke EMH-like reasoning quite yet. Give us a nice long winter like physics has had, and then we can use this heuristic.


It's not strictly a question of trying Everything and seeing what sticks.

There's a hypothesis in the parent comment - better handling of the time domain will lead to better modeling - which is actually fairly independent of architecture. So, there's going to be a number of possible ways to build better time modeling, ranging from tweaks to existing architecture to completely rebuilding Rome. So, if better time modeling really is a limitation, you don't need to rebuild Rome to find out.

In fact, I might argue that S4 layers already provide this improved time handling in the current world, and is proving very successful, which would again widen the moat for SNNs.


I think people said the same thing about NNs in general before we hit a scale where they started performing magic.

There could be exponential or quadratic scaling laws with any of these black boxes that makes one approach suddenly extremely viable or even dominant.


> There could be exponential or quadratic scaling laws with any of these black boxes that makes one approach suddenly extremely viable or even dominant.

The reason I like the CPU approach is the memory scaling is bonkers compared to GPU. You can buy a server that has 12TB of DRAM (in stock right now) for the cost of 1 of those H100 GPU systems. This is enough memory to hold over 3 trillion parameters with full 32-bit FP resolution. Employ some downsampling and you could get even more ridiculous.

If 12TB isn't enough, you can always reach for things like RDMA and high speed interconnects. You could probably get 100 trillion parameters into 1 rack. At some point you'll need to add hierarchy to the SNN so that multiple racks & datacenters can work together.

Imagine the power savings... It's not exactly a walk in the park, but those DIMMs are very eco friendly compared to GPUs. You don't need a whole lot of CPU cores in my proposal either. 8-16 very fast cores per box would probably be more than enough, looking at how fintech does things. 1 thread is actually running the entire show in my current prototype. The other threads are for spike timers & managing other external signals.


Is your current prototype open source?


Not the TS, but that's actually the same goal I have in mind with [0] project.

Right now I'm building my homelab server which aimed to fit 1 TB RAM and 2 CPUs with ~100 cores total.

It will cost like 0.1% of what I need to pay for GPU cluster with the same memory size :)

[0] https://github.com/gotzmann/llama.go/


I'm bullish on SNNs too. This Chinese research group is doing something quite comprehensive with them:

https://news.ycombinator.com/item?id=35037605


I know what it looks like in my head but I can’t quite figure the algorithm out. The spiking is basically reinforcement learning at the neuron level. Get it right and it’s basically all you need. You don’t even need training data because it will just automagically learn from the data it sees.


This is like trying to build a more sophisticated bird-costume for flight instead of a steel flying machine. Human intelligence has a lot of features that are not necessary for artificial intelligence, because it doesn't have to build itself from ground-up using biology.


I don’t know how much the biological inspiration is a useful abstraction anymore. I’ve gone from thinking of matrix operations as an efficient way to implement neural networks to thinking of neural networks as a complicated sort of allegory for chained matrix ops and it feels much more natural to me.


Do we want to train the model during inference? I would think we want to be very particular about the data used to train - notably after OpenAI's notes on the subject.


> This gives you all kinds of ridiculous capabilities, most notably being the ability to train the model while it's live in production (learning & use are effectively the same thing).

Bad, BAD idea. Remember the Tay chatbot, which 4chan managed to turn into a raging Nazi in the matter of a few hours?


>"the company’s CEO, Sam Altman, says further progress will not come from making models bigger. “I think we're at the end of the era where it's going to be these, like, giant, giant models,” he told an audience at an event held at MIT late last week. “We'll make them better in other ways.”

So to reiterate, he is not saying that the age of giant AI models is over. Current top-of-the-line AI models are giant and likely will continue to be. However, there's not point in training models you can't actually run economically. Inference costs need to stay grounded which means practical model sizes have a limit. More effort is going to go into making models efficient to run even if it comes at the expense of making them less efficient to train.


Quality over quantity. Just building a model with a gazillion parameters isn't indicative of quality, you could easily have garbage parameters with tons of overfitting. It's like megapixel counts in cameras: you might have 2000 gigapixels in your sensor, but that doesn't mean you're going to get great photos out of it if there are other shortcomings in the system.


What overfitting? If anything, LLMs suffer from underfitting, not overfitting. Normally, overfitting is characterized by increasing validation loss while training loss is decreasing, and solved by early stopping (stopping before that happens). Effectively, all LLMs are stopped early, so they don't suffer from overfitting at all.


I don't disagree with you, these models may be underfitted, but overfitting is not explicitly defined by val vs. training loss, but rather how closely its output matches training data.

If you trained a MLP model where the number of parameters exceeded the data, it would be able to memorize the data and return a zero loss on training data. The larger the models are, the greater chance it memorizes the data, rather than the latent variables or distribution of the data.

Early LLMs, GPT2 (circa 2019) for example was definitely overfitting. I would frequently copy and paste output and find a reddit comment with the exact words.


Is cost really that much of a burden?

Intelligence is the single most expensive resource on the planet. Hundreds of individuals have to be born, nurtured, and educated before you might get an exceptional 135+ IQ individual. Every intelligent person is produced at a great societal cost.

If you can reduce the cost of replicating a 135 IQ, or heck, even a 115 IQ person to a few thousand dollars, you're beating biology by a massive margin.


But we're still nowhere near that, or even near surpassing the skill of an average person at a moderately complex information task, and GPT-4 supposedly took hundreds of millions to train. It also costs a decent amount more to run inference on it vs. 3.5. It probably makes sense to prove the concept that generative AI can be used for lots of real work before scaling that up by another order of magnitude for potentially marginal improvements.

Also, just in terms of where to put your effort, if you think another direction (for example, fine-tuning the model to use digital tools, or researching how to predict confidence intervals) is going to have a better chance of success, why focus on scaling more?


There are a lot of employees at large tech consultancies that don't really do anything that can't be automated away by even current models.

Sprinkle in some more specific training and I can totally see entire divisions at IBM and Accenture and TCS being made redundant.

The incentive structures are perversely aligned for this future - the CEO who manages to reduce headcount while increasing revenue is going to be very handsomely rewarded by Wall Street.


How is that perverse? That is the logical incentive. The perverse one is that middle managers rise by hiring people needlessly and building fiefdoms.


Wall Street would be strongly incentivised to install an AI CEO.


The reason we put everyone through school is we believe that it’s in society’s best interest to educate everyone to the peak of their abilities. It’s good for many different reasons.

It would be much easier to identify gifted kids and only educate them, but I happen to agree that universal education is better.


It would be much easier to identify gifted kids and only educate them

Is it so easy?


There’s downsides and tradeoffs but yes, if we wanted to we could stop trying to teach kids with below average IQs calculus, unless they specifically wanted to.


Are intelligent people that valuable? There's lots of them at every university working for peanuts. They don't seem to be that valued by society, honestly.


If you ask any Fortune 500 CEO if he could magically take all the 135 IQ artists and academics and vagabonds, erase all their past traumas, put them through business or tech school, and put them to work in their company, they would all say 100% yes.

An equivalent AI won't have any agency and will be happy doing the boring work other 135 IQ humans won't.


IQ isn't all that. Mine is 140+ and I'm just a somewhat well paid software engineer. It's TOO abstract a metric in my view - for sure it doesn't always translate into real world success.


Mine is 150-160 (varies by how much I’m sleep deprived during the IQ test) and I’m told that I’m exceptionally intelligent by teachers, friends, colleagues, most everyone I met since early childhood. I guess the more the difference to average, the more it stands out. From my experience, I believe higher IQ is nothing but better pattern recognition and being smart or genius means merely higher IQ + very good memory capability. I believe those two are interlinked [0]. By memory capability I mean not forgetting anything you’ve ever seen, not in the sense of being able to recall every minute of your entire life, but in the sense of reliably and always being able to recall all info regarding X if you’ve seen X only once in your life. Higher intelligence doesn’t mean you’re automatically better off than everyone also. It just means that you can be far ahead of everyone in any cognitive task with a far smaller amount of effort put in. Note that this doesn’t conflict with IQ being a reliable predictor of financial success, rather I believe this is the reason for it.

[0]: https://saveall.ai/blog/learning-is-remembering


Right were very much in the same boat. I'm good at pattern recognition I guess. I learn things quickly. What else? I don't have magic powers really. I still get headaches and eat junk food.


This only makes sense if you use "IQ" ignoring the actual definition of "IQ", in which case it's silly to use numbers in your post to make it look technical.

IQ 1. can't be compared against generations of IQ tests 2. supposedly doesn't test education (of course, it actually does) 3. isn't real.


Since IQ is just a normal distribution on a population it is a bit misleading to talk about it like that.

Even if we don't expend any cost on education the number of people with IQ 135 stays the same.


My IQ is 140 and I'm far from exceptional.


115 IQ isn’t all that high- that’s basically every Indian American or a healthy percentage of the Chinese population.

Edit: I don’t understand the downvotes. I don’t mean this in any disparaging way, just that an AGI is probably going to be a lot higher than that.


115 IQ is perfectly fine for the majority of human endeavors.


Yes, but it also tells us that if Altman is honest here, then he doesn’t believe GPT-like models can scale to near level human performances (because even if the cost of compute was 10x or even 100x it would still be economically sound).


No it doesn't.

For one thing they're already at human performance.

For another, i don't think you realize how expensive inference can get. Microsoft with no scant amount of available compute is struggling to run gpt-4 such that they're rationing it between subsidiaries while they try to jack up compute.

So saying, it would be economically sound if it cost x10 or x100 what it costs now is a joke.


This tells me you haven't really stress tested the model. GPT is currently at the stage of "person who is at the meeting, but not really paying attention so you have to call them out". Once GPT is pushed, it scrambles and falls over for most applications. The failure modes range from contradicting itself, making up things for applications that shouldn't allow it, to ignoring prompts, to simply being unable to perform tasks at all.


Are we talking about bare GPT through the UI, or GPT with a framework giving it access to external systems and the ability to store and retrieve data?

Because, yeah, “brain in a jar” GPT isn’t enough for most tasks beyond parlor-trick chat, but being used as a brain in a jar isn’t the point.


We have given it extensions, and really the extensions do a lot of the work. The tool that judges the style and correctness of the text based on the embedding is doing much of the heavy lifting. GPT essentially handles generating text and dense representations of the text.


Still waiting to see those plugins rolled out and actual vector DB integration with GPT 4, then we'll see what it can really do. Seems like the more context you give it the better it does, but the current UI really makes it hard to provide that.

Plus the recursive self prompting to improve accuracy.


How are they at human performance? Almost everything GPT has read on the internet didn‘t even exist 200 years ago and was invented by humans. Heck, even most of the programming it does wasn‘t there 20 years ago.

Not every programmer starting from scratch would be brilliant, but many were self taught with very limited resources in the 80s form example and discovered new things from there.

GPT cannot do this and is very far from being able to.


>How are they at human performance?

Because it performs at least average human level (mostly well above average) on basically every task it's given.

"Invest something new" is a nonsensical benchmark for human level intelligence. The vast majority of people have never and will never invent anything new.

If your general intelligence test can't be passed by a good chunk of humanity then it's not a general intelligence test unless you want to say most people aren't generally intelligent.


Yeah these intelligence tests are not very good.

I would argue some programmers do in fact invent something new. Not all of them, but some. Perhaps 10%.

Second the point is not whether everyone is by profession an inventor but whether most people can be inventors. And to a degree they can be. I think you underestimate that by a large margin.

You can lock people in a room and give them a problem to solve and they will invent a lot if they have the time to do it. GPT will invent nothing right now. It‘s not there yet.


>Yeah these intelligence tests are not very good.

Lol Okay

>And to a degree they can be. I think you underestimate that by a large margin.

Do i? Because i'm not the one making unverifiable claims here.

>You can lock people in a room and give them a problem to solve and they will invent a lot if they have the time to do it.

If you say so


> Not all of them, but some. Perhaps 10%.

Just listen to what you're saying:

- GPT isn't at human level because GPT isn't able to invent something new

- Not all programmers invent something new, but some. Perhaps 10%

I'm pretty sure this implies literally that 90% programmers aren't human level.

The lengths to which people are willing to go to dismiss GPT's abilities is mind boggling to me.


> Because it performs at least average human level (mostly well above average) on basically every task it's given.

No, GPT4 fails at some very basic tasks. It can't count letters passed 15.


It doesn't see words.


I've been training large 65b models on "rent for N hours" systems for less than 1k per customized model. Then fine tuning those to be whatever I want for even cheaper.

2 months since gpt 4.

This ride has only just started, fasten your whatevers.


Finetuning cost are nowhere near representative of the cost to pre-train those models.

Trying to replicate the quality of GPT-3 from scratch, using all the tricks and training optimizations in the books that are available now but weren't used during GPT-3 actual training, will still cost you north of $500K, and that's being extremly optimistic.

GPT-4 level model would be at least 10x this using the same optimism (meaning you are managing to train it for much cheaper than OpenAI). And That's just pure hardware cost, the team you need to actually makes this happen is going to be very expensive as well.

edit: To quantify how "extremely optimistic" that is, the very model you are finetuning, which I assume is Llama 65B, would cost around ~$18M to train on google cloud assuming you get a 50% discount on their listed GPU prices (2048 A100 GPUs for 5 months). And that's not even GPT-4 level.


$5M to train GPT-4 is the best investment I've ever seen. I've seen startups waste more money for tremendously smaller impact.


As I stated in my comment, $5M is assuming you can do a much much better job than OpenAI at optimizing your training, only need to make a single training run, your employees salaries are $0, and you get a clean dataset for essentially free.

Real cost is 10-20x that.

That's still a good investment though. But the issue is you could very well sink $50M into this endeavour and end up with a model that actually is not really good and gets rendered useless by an open-source model that gets released 1 month later.

OpenAI truly has unique expertise in this field that is very, very hard to replicate.


> and end up with a model that actually is not really good and gets rendered useless

ahem Bard ahem


You are confusing training with fine-tuning which is a different beast.


No I'm not, it's the full model on 8 gpus for a couple hundred. After training I fine tune for chats but mostly command and control tools, and then you fine-tune for application.


"The great supercomputer Deep Thought is the most powerful computer ever built, with one exception. It was designed by hyper-intelligent pan-dimensional beings, who wanted to know the answer to Life, the Universe, and Everything.

Its creation annoyed a fair few philosophers, who felt that it was taking over their turf.

After seven and a half million years of serious cogitation, Deep Thought spoke the answer. However, it was so inexplicable that Deep Thought then had to go on and design the most powerful computer ever built (with no exceptions) to work out what the question was."


Feels very much like we are entering the Happy Vertical People Transporters era.

How long till the air con goes on strike for miserable working conditions?

"Brain the size of a planet and they ask me write a lesson plan in the style of a pirate" - chatgpt5, probably...


And that was earth, so are we asking the right questions?


and the answer to question of life is 42.


I'd bet that what he, and the competition, is realizing is that the bigger models are too expensive to run.

Pretty sure Microsoft swapped out Bing for something a lot smaller in the last couple of weeks; Google hasn't even tried to implement a publicly available large model. And OpenAI still has usage caps on their GPT-4.

I'd bet that they can still see improvement in performance with GPT-5, but that when they look at the usage ratio of GPT3.5 turbo, gpt3.5 legacy, and GPT4, they realized that there is a decreasing rate of return for increasingly smart models - most people don't need a brilliantly intelligent assistant, they just need a not-dumb assistant.

Obviously some practitioners of some niche disciplines (like ours here) would like a hyperintelligent AI to do all our work for us. But even a lot of us are on the free tier of ChatGPT 3.5; I'm one of the few paying $20/mo for GPT4; and idk if even I'd pay e.g. $200/mo for GPT5.


> I'd bet that what he, and the competition, is realizing is that the bigger models are too expensive to run.

I think it's likely that they're out of training data to collect. So adding more parameters is no longer effective.

> most people don't need a brilliantly intelligent assistant, they just need a not-dumb assistant.

I tend to agree, and I think their pathway toward this will all come from continuing advances in fine tuning. Instruction tuning, RLHF, etc seem to be paying off much more than scaling. I bet that's where their investment is going to be turning.


Once they can add videos they will have a lot of new training data.


I think Sam is referring to transition from "Deep" to "Long" learning [1]. What new emergent properties, if any, will 1 billion tokens unlock?

[1] https://hazyresearch.stanford.edu/blog/2023-03-27-long-learn...


I don't quite understand this. The returns to scale have always been sublinear (ie "diminishing"), but the scale-maximalists didn't worry about that before. Also I heard Sam tout on the Lex Friedman podcast how amazing it is that the scaling laws worked so well for GPT-4. So I wonder what changed?

Sure yeah the cost numbers are getting very large and we can't keep scaling forever. But Google could easily 10x the training cost of GPT-4 if they thought it would protect their search business. I'm still skeptical that scaling is enough to reach the thresholds we want, but I'm surprised that it's being claimed right now when there's a huge rush of new money into the space. I wonder if this is some sort of misdirection by Sam


> So I wonder what changed?

Maybe cost & latency for both training and inference it getting too high. If costs doubled for every 5% better performance, would it be worth it? NVIDIA is making a small fortune from this.

> But Google could easily 10x the training cost of GPT-4 if they thought it would protect their search business

Google makes /\$0\.[0+]\d/ per search query. If the inference cost of the model exceeds that, they go from making money to losing money. It is not clear if the Bing integration is a money maker or a lost leader.


Once you've trained on the internet and most published books (and more...) what else is there to do? You can't scale up massively anymore.


You could have it start talking to itself in the way that AlphaGO learns to get better at Go. All that needs to be done is find some fitness function that indicates that useful knowledge has been produced. In Go and Chess this is easy.

It can start posting synthesized ideas on social media and see how many likes it gets. Coupled with a metric containing dissimilarity to current information, this could be a useful way to progress to superhuman insights.


Ilya Sutskever (OpenAI Chief Scientist): "Yeah, I would say the data situation is still quite good. There's still lots to go" - https://youtu.be/Yf1o0TQzry8?t=685

There was a rumor that they were going to use Whisper to transcribe YouTube videos and use that for training. Since it's multimodal, incorporating video frames alongside the transcriptions could significantly enhance its performance.


And why will google allow them to do that at scale?


Can google stop them? It’s trivial to download YouTube videos


It’s trivial to download some YouTube videos.

But I am quite sure that if you start doing it at scale, google will notice.

You could be sneaky, but people in this business talk (since they know another good paying job is just around the corner) so It would likely come out.


OpenAI can afford to build a scraping network, built with thousands of proxies.


They can afford to. But will Microsoft risk doing that and massive possible lawsuit from Google.

Google is also owns a lot of it's own backbone, so It would be a lot easier for them to play network games.

And they could even try to be sneaky and try poisoning the data if it comes to that.

And since OpenAI konw that, since they probably have people that used to work at google at some point, they are unlikely to try.

Even less likely if Microsoft would know. MS is probably the only company that has even more layers than Oracle and they would not approve.


One way would be to get people to let AI watch as they interact with computer (watch YouTube or perform other tasks). You might even be able to outsource some of the computing to the local system.


Why would they ask Google for permission?


Because youtube is owned by google and google can stop download at scale


I am curious how much video-to-text content represent compared to pure text. I have no idea.


If you were devious enough, you could be listening in on billions of phone conversations and messages and adding that to your data set.

This also makes me doubt that NSA hasn't already cracked this problem. Or that China won't eventually beat current western models since it will likely have way more data collected from its citizenry.


I wonder what percentage of phone calls would add anything meaningful to models, I imagine that the nature of most phone calls are both highly personal and fairly boring.


That's a fair point. Not at all like training on Wikipedia in which nearly every sentence has novelty to it.

Then again it would give you data on every accent in the country, so the holy grail for modelling human speech.


You can transcribe all spoken words everywhere and keep the model up to date? Keep indexing new data from chat messages, news articles, new academic work etc.

The data is not finite.


What about all the siloed content kept inside corporate servers? You won't get normal GPT to train on it, of course, but IBM could build a "IBM-bot" that has all the GPT-4 dataset + all of IBM's internal data.

That model might be very well tuned to solve IBM's internal problems.


I don't think you can just feed it data. You've got to curate it, feed it to the LLM, and then manually check/further train the output.

I also question that most companies have the volume and quality of data worth training on. It's littered with cancelled projects, old products, and otherwise obsolete data. That's going to make your LLM hallucinate/give wrong answers. Especially for regulated and otherwise legally encumbered industries. Like can you deploy a chat bot that's wrong 1% or 0.1% of the time?


Well, IBM has 350k employees. If training a LLM on curated data costs tens of millions of dollars but ends up reducing headcount by 50k, it would be a massive win for any CEO.

You have to understand that all the incentives are perfectly aligned for corporations to put this to work, even spending tens of millions in getting it right.

The first corporate CEO who announces that his company used AI to reduce employee costs while increasing profits is going to get such a fat bonus that everyone will follow along.


Since Chat-GPT-4 is being integrated into the MS Office suite, this is an "in" to corporate silos. The MS cloud apps can see inside a great many of those silos.


Video content (I don't know why someone flagged Jason for saying such, he is totally right)


Looking at his post history, seems like he was shadowbanned.


Right. They've already sucked in most of the good general sources of information. Adding vast amounts of low-quality content probably won't help much and might degrade the quality of the trained model.


Videos - all of youtube, all the movies, everything that's ever been captured on film. Transcribe the audio, automatically describe the images and try to predict the next one.


The problem is, movies are completely full of misinformation and inaccuracies.

Maybe if you trained it on movies before CGI existed ?


Youtube. This is where Google have huge advantage having largest collection of user generated video


Yeah, but it's not like the videos are private. Surely Amazon has the real advantage, given they have a ton of high quality tokens in the form of their kindle library and can make it difficult for OpenAI to read them all.


They didn't train it on the entire internet tho, only a small amount (in comparison to entire internet). Still plenty they could do.


Real answer? Buy proprietary data from social media companies, credit card companies, retail companies and train the model on that data.


Can't wait for us to be able to query GPT for peoples credit card info


Video.

> YouTubers upload about 720,000 hours of fresh video content per day. Over 500 hours of video were uploaded to YouTube per minute in 2020, which equals 30,000 new video uploads per hour. Between 2014 and 2020, the number of video hours uploaded grew by about 40%.


But what are you mostly "teaching" the LLM then? Mundane everyday stuff? I guess that would make them better at "being average human" but is that what we want? It already seems that prompting the LLM to be above-average ("pretend to be an expert") improves performance.


This whole conversation about training set size is bizarre. No one ever asks what’s in the training set. Why would a trillion tokens of mundane gossip improve a LLMs ability to do anything valuable at all?

If a scrape of the general internet, scientific papers and books isn’t enough, a trillion trillion trillion text messages to mom aren’t going to change matters.


I doubt they have trained on 0.1% of the tokens that are 'easily' available (that is, available with licencing deals that are affordable to OpenAI/MSFT).

They might have trained on a lot of the 'high quality' tokens, however.


people seem to have forgotten about the multi-modal GPT-4

There's a ton of potential left on the table. The question is if transformers have hit their limit with GPT-4 or not.

It's a pretty simple equation when you think about it this way and why Sam would say they have hit their limit. Sam is basically Microsoft and they want to retain their lead. Once Google learns to put their data to use correctly, it's almost guaranteed game over for OpenAI if they want it to be.


You can generate textual examples that teach logic, multi-dimensional understanding and so on. Similar to the ones that are in math books, but in a massive scale.


> Once you've trained on the internet and most published books (and more...) what else is there to do? You can't scale up massively anymore.

Dataset size is not relevant to predicting the loss threshold of LLMs. You can keep pushing loss down by using the same sized dataset, but increasingly larger models.

Or augment the dataset using RLHF, which provides an "infinite" dataset to train LLMs on. Limited by the capabilities of the scoring model which, of course, you can scale the scoring model infinitely so again the limit isn't dataset size but training compute.


> Dataset size is not relevant to predicting the loss threshold of LLMs. You can keep pushing loss down by using the same sized dataset, but increasingly larger models.

Deepmind and others would disagree with you! No-one really knows in actual fact.

[1] https://www.deepmind.com/publications/an-empirical-analysis-...


I don't recall the Chinchilla paper disputing my point. They establish "training-compute optimal" scaling laws, but none of their findings suggest that loss hits any kind of asymptote.


Perhaps we're talking past each other, is "loss threshold" a specific term in LLM literature?

Merely pointing out that the debate as to whether we are compute or data limited (OP) has not concluded at all; There are lots of compelling theories on relationship between the two.


In other words, "this approach is getting to expensive for us and we need to control the narrative for our model sizes going down".

Most increases in Imagenet etc scores came from bigger models. "The Unreasonable Effectiveness of Data" has aged very well. It seems very convenient for OpenAI that this trend should be over a few months after their "eye-wateringly expensive" comment.


SamA strikes me as overly disingenuous - as if most of his interactions are marketing pieces / warfare. The entire lex + sama podcast seemed this way to me.

I’m incredibly biased though and feel slighted (as a part of humanity) by openAI’s actions, so perhaps I’m looking at his actions through a hateful lens.


I feel the same. His narrative seem very calculated. It doesn't help that OpenAI is very much the most closed Saas thing to be released in recent years. I am a paying customer for sometime, have ChatGPT Plus as well. I am still in waiting list for GPT-4 API.

Not only that, but their business model is completely unclear, which is the scariest part for me, as an individual developer.

Let's go one step forward, they use our texts online including this very comment perhaps. They have a software that can analyse all our texts at scale. But we don't get even the API access to this thing.

It is the most self-serving software ever released, consuming the human creativity both the content and the future relevance of it.


They could have it open source and people like Gerganov could rewrite the code in C/C++, but god forbid they actual are open


https://archive.is/s4V9e

He did not say what kind of research strategies or techniques might take its place. In the paper describing GPT-4, OpenAI says its estimates suggest diminishing returns on scaling up model size. Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them.


> In the paper describing GPT-4, OpenAI says its estimates suggest diminishing returns on scaling up model size.

I read the two papers (gpt 4 tech report, and sparks of agi) and in my opinion they don't support this conclusion. They don't even say how big GPT-4 is, because "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."

> Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them.

OK so his argument is like "the giant robots won't be powerful, but we won't show how big our robots are, and besides, there are physical limits to how giant of a robot we can build and how quickly we can build it." I feel like this argument is sus.


OpenAI has likely run into a wall (or is about to) for model size given it's funding amount/structure[1] - unlike its competition who actually own data centers and have lower marginsl costs. It's just like when peak-iPad Apple claimed that a "post-PC" age was upon us.

1. What terms could Microsoft wring out of OpenAI for another funding round?


The Age of Giants is over... The Age of Behemoths has begun!

but sir, that means the same thing

Throw this heretic into the pit of terror.


The pit of terror is full.

Fine, to the outhouse of madness then.

Before I get nuked from orbit for daring to entertain humor, if someone is running ahead of me in a marathon, and running so far ahead, yet still broadcasting things to the back for the slow people (like myself), then eventually we catch up to them, and they suddenly say, you know what guys, we should stop running in this direction, there's nothing to see here right before anyone else is able to verify the veracity of their statement, perhaps it would still be in the public interest for at least one person to verify what they are saying. Given how skeptical the internet at large has been of Musk's acquisition of a company, it's interesting that the skepticism is suddenly put on hold when looking at this part of his work...


Eh.

Altman has a financial incentive to lie and obfuscate about what it takes to train a model like GPT-4 and beyond, so his word is basically worthless.


First of all, if Altman continually makes misleading statements about AI he will quickly lose credibility, and that short term gain in whatever 'financial incentive' that birthed the lie would be eroded in short order by a lack of trust of the head of one of the most visible AI companies in the world.

Secondly, all the competitors of OpenAI can plainly assess the truth or validity of Altman's statements. There are many companies working in tandem on things at the OpenAI scale of models, and they can independently assess the usefulness of continually growing models. They aren't going to take this statement at face value and change their strategy based on a single statement by OpenAI's CEO.

Thirdly, I think people aren't really reading what Altman actually said very closely. He doesn't say that larger models aren't useful at all, but that the next sea change in AI won't be models which are orders of magnitude bigger, but rather a different approach to existing problem sets. Which is an entirely reasonable prediction to make, even if it doesn't turn out to be true.

All in all, "his word is basically worthless" seems much to harsh an assessment here.


I've seen Altman say in an interview that training GPT-4 took "hundreds of little things".

I don't find this implausible, but it folds slightly to Ockham's razor when you consider that this is the exact type of statement that would be employed to obfuscate a major breakthrough.

It just makes me crook my eyebrow and look to more credible sources.


It is possible that GP meant that Altman’s word is basically worthless to them, in which case that’s not something that can be argued about. It’s a factually true statement that that is their opinion of that man.

I personally can see why someone could arrive at that position. As you’ve pointed out, taking Sam Altman at face value can involve suppositions about how much he values his credibility, how much stock OpenAI competitors put in his public statements, and the mindsets people in general have when reading what he writes.


Elon Musk has been constantly doing this and thriving.


Hacker News posters too


Anyone with the expertise to have insightful takes in AI also has a financial incentive to steer the conversation in particular directions. This is also the case for many, many other fields! You do not become an expert by quarantining your livelihood away from your expertise!

The correct response is not to dismiss every statement from someone with a conflict of interest as "basically worthless", but to talk to lots of people and to be reasonably skeptical.


OpenAI has gone from open-sourcing its work, to publishing papers only, to publishing papers that omit important information, to GPT-4 being straight-up closed. And Sam Altman doesn't exactly have a track record of being overly concerned about the truth of his statements.


I had a fun conversation (more like argument) with ChatGPT about the hypocrisy of OpenAI. It would explicitly contradict itself and then began starting every reply with “I can see why someone might think…” and then just regurgitating fluff about democratizing AI. I finally was able to have it define democratization of technology and then recognize the absurdity of using that label to describe a pivot to gating models and being for-profit. Then it basically told me “well it’s for safety and protecting society”.

An AI, when presented with facts counter to what it thought it should say, agreed and basically went: “Won’t someone PLEASE think of the children!”

Love it.


Transcripts of other people's GPT chats are like photos of other people's kids.


It was trained on corpus full of mainstream media lies, why would you have expected otherwise? It's by far the most common deflection in its training set.

It's easy to recognize and laugh at the AI replying with the preprogrammed narrative, I'm still waiting for the majority of people realizing they are given the same training materials, non-stop, with the same toxic narratives, and becoming programmed in the same way, and that is what results in their current worldview.

And no, it's not enough to be "skeptic" of mainstream media. It's not even enough to "validate" them. Or to go to other sources. You need to be reflective enough to realize that they a pushing a flawed reasoning methods, and then abusing them again and again, to get you used to their brand of reasoning.

Their brand of reasoning is just basically reasoning with brands. You're given negative sounding words for things they want you to think are bad, and positive sounding words for things they want you to think are good, and continuously reinforce these connections. They brand true democracy (literally rule of the people) as populism and tell you it's a bad thing. They brand freedom of speech as "misinformation". They brand freedom as "choice" so that you will not think of what you want to do, but which of the things they allow you to do will you do. Disagree with the scientific narrative? You're "science denier". Even as a professional scientist. Conspiracy theory isn't a defined word - it is a brand.

You're trained to judge goodness or badness instinctively by their frequency and peer pressure, and produce the explanation after your instinctive decision, instead of the other way around.


>> Their brand of reasoning is just basically reasoning with brands.

Excellent post, especially this part. Sums up the problem perfectly.


"Then it basically told me “well it’s for safety and protecting society”."

That was pretty much OpenAI's argument when they first published that GPT-3 paper. "Oh no so scary people might use it for wrong stuff, only we should have control of it."

In reality they just saw $$$ blink.


Without getting into morality.

It’s pretty easy to have chatGPT contradict itself, point it out and have the LLM respond « well, I’m just generating text, nobody said it had to be correct »


Why are you discussing OpenAI with ChatGPT? I’m honestly interested.

I would imagine that any answer of ChatGPT on that topic is either (a) „hallucinated“ and not based on any verifiable fact or (b) scripted in by OpenAI.

The same question pops up for me whenever someone asks ChatGPT about the internals and workings of ChatGPT. Am I missing something?


Simple curiosity. I wanted to see if it could explain the shift in OpenAIs operating in a way that might give some interesting or perhaps novel insight (even if hallucinated) other than what their corpo-speak public facing reasoning is.

For the most part it just regurgitated the corpo-speak with an odd sense of confidence. I know that’s the point of the model, but it can also be surprisingly honest when it incorporates what it knows about human motivation and business.


I’ve try because it’s tempting and the first attempts do give a « conversation » vibe.

I was curious about state persistence between prompt, or how to get my prompt better, or having a idea of the training data.

Only got crap and won’t spend time doing that again


This trend has happened in the small for their APIs as well. They've been dropping options - the embeddings aren't the internal embeddings any more, and you don't have access to log probabilities. It's all closing up at every level.


It's incredible that people are so eager to eat up these unsupported claims.

This is the second [1] OpenAI claim in the span of a few days that conveys a sense of "GPT-4 represents a plateau of accomplishment. Competitors, you've got time to catch up!".

And it's not just a financial incentive, it's a survival incentive as well. Given a sufficiently sized (unknowable ahead of time) lead, the first actor that achieves AGI and plays their cards right, can permanently suppress all other ongoing research efforts should they wish to.

Even if OpenAI's intentions are completely good, failure to be first could result in never being able to reach the finish line. It's absolutely in OpenAI's interest to conceal critical information, and mislead competing actors into thinking they don't have to move as quickly as they can.

[1] https://news.ycombinator.com/item?id=35570690


In this case I think it's Wired that's lying. Altman didn't say large models have no value, or that there will be no more large models, or that people shouldn't invest in large models.

He said that we are at the end of the era where capability improvements come primarily from making models bigger. Which stands to reason... I don't think anyone expect us to hit 100T parameters or anything.


"640KB of memory ought to be enough for anybody!"

- Abraham Lincoln


But just look at what all Lincoln accomplished with 640KB of memory. In the grand examination of time, one might even say that Lincoln is a more important figure than ChatGPT itself.


What about 1T though, seems silly to stop here.


Like Altman said, it's comparable to the GHz race in the 1990's. If 4GHz is good, 5GHz is better, why not 10GHz?

Turns out there are diminishing returns and advances come from other dimensions. I've got no opinion on whether he's right or not, but he's certainly in a better position to opine that current scale has hit diminishing returns.

In any event, there's nothing special about 1T parameters. It's just a round base-10 number. It is no more magic than 900B or 1.3T.


It could also be argued that there's financial incentive to just saying "giving us more money to train bigger models = better AI" forever


I don't think these comments are driven from financial incentives. It's a distraction and only a fool would believe Altman here. What this likely means is they are prioritizing adding more features to their current models while they train the next version. Their competitors scramble to build an LLM with some sort of intelligence parity, when that happens no one will care because ChatGPT has the ecosystem and plugins and all the advanced features....and by the time their competitors reach feature parity in that area, OpenAI pulls its Ace card and drops GPT5. Rinse and repeat.

That's my theory and if I was a tech CEO in any of the companies competing in this space, that is what I would plan for.

Training an LLM will be the easy part going forward. It's building an ecosystem around it and hooking it up to everything that will matter. OpenAI will focus on this, while not-so-secretly training their next iterations.


text-davinci-003 but cheaper and runs on your own hardware is already a massive selling point. If you you release a foundational model at parity with GPT4 you'll win overnight because OpenAI's chat completions are awful even with the super advanced model.


Does he even have any background in machine learning? I always found it bizarre that he was chosen to be OpenAI's CEO...


On the Lex Fridman podcast, he pretty much admitted he's not an AI (per se) and isn't the most excited about the tech (as he could be).


> he pretty much admitted he's not an AI

Yeah, I also had a hunch he wasn't an AI. (I assume you meant "AI researcher" there :))

All joking aside, I wonder how that's affecting company morale or their ability to attract top researchers. I know if I was a top AI researcher, I'd probably rather work at a company where the CEO was an expert in the field (all else being equal).


Honestly I'm not sure it matters that much. CEOs who are not experts or researches in a domain can still build great companies and empower their employees to do incredible work. Lots of tech people absolutely love to point out that Steve Jobs was not an engineer, but under his leadership the company invented three products that totally revolutionized different industries. Now, I'm not going to sit here and say Altman is Jobs, but running a company, knowing how to raise money, knowing how to productize technologies, etc are all very important skills that industry researchers aren't always good at.


I feel most CEOs are not top of their field but rather people who can take a vision and run with it.


It might be true in general; however, AI research laboratories are typically an exception, as they are often led by experienced AI researchers or scientists with extensive expertise in the field.


And that's why they have a hard time getting their stuff out there and getting the money they need. I mean, trying to run a business like a research lab is kind of flawed, you know? And you don't always want some Musk-like character messing around with the basics of the company


I think Ilya leading the tech is plenty morale boosting


Ilya gives numerous talks and interviews, and he's well worth listening to about technical matters. I listened to many of his talks recently, and the main theme is that scaling up compute works, and will continue to do so. His optimism about the potential of scaling to support deep learning has clearly guided his entire career, starting with his early success on AlexNet.


I listened to that podcast but didn't catch the part where he says he isn't that excited. When was that?


He has background in CEO (smooth-talking charmer in the VC crowd). That's why he's CEO.


Do you think GPT-4 was trained and then immediately released to the public? Training finished Aug 2022. They spent the next 6 months improving it in other ways (eg human feedback). What he is saying is already evident therefore.


People can be honest even when money is involved. His word is worthless because it's Altman


IIRC Altman has no financial stake in the success or failure of OpenAI to prevent these sorts of conflicts of interests between OpenAI and society as a whole


https://www.cnbc.com/2023/03/24/openai-ceo-sam-altman-didnt-... (https://news.ycombinator.com/item?id=35289044 - 24 days ago; 158 points, 209 comments)

> OpenAI’s ChatGPT unleashed an arms race among Silicon Valley companies and investors, sparking an A.I. investment craze that proved to be a boon for OpenAI’s investors and shareholding employees.

> But CEO and co-founder Sam Altman may not notch the kind of outsize payday that Silicon Valley founders have enjoyed in years past. Altman didn’t take an equity stake in the company when it added the for-profit OpenAI LP entity in 2019, Semafor reported Friday.


Right. All the evidence points to more potential being left on the table for emergent abilities. It would make no sense that the model would develop all of these complex skills for better predicting the next token, then just stop.

It's a massive bet for a company to push compute into the billion dollar range - if saying something like this has the potential to help ward off those decisions, I don't see what's stopping them from saying it.

I basically see Microsoft talking when Sam talks.


Citation needed. What are his financial incentives?


We will need a combination of technologies we have in order to really achieve emergent intelligence.

Humans are comprised of various "subnets" modelling aspects which, in unison, produce self-conciousness and real intelligence. What is missing in the current line of approaches is that we only rely on auto-alignment of subnetworks by machine learning, which scales only up to a point.

If we would produce a model which has

* something akin a LLM as we know it today, which is able to

* store or fetch facts to a short- ("context") or longterm ("memory") storage

* if not in the current "context", query the longterm context ("memory") by keywords for associations, which are one-by-one inserted into the current "context"

* repeat as required until fulfilling some self-defined condition ("thinking")

To me, this is mostly mechanical plumbing work and lots of money.

Also, if we get rid of the "word-boundedness" of LLMs - which we already did to some degree, as shown by the multi-language capabilities - LLMs would be free to roam in the domain of thoughts /s :)

This approach could be further improved by meta-LLMs governing the longterm memory access, providing an "intuition" which longterm memory suits the provided context best. Apply recursion as needed to improve results (paying by exponential training time, but this meta-NN will quite probably be independent of actual training, as real life / brain organization shows).


The other elements that may be required could be some version of the continuous sensory input that to us creates the sensation of "living" and, this one is a bit more philosophical, the sensation of suffering and a baseline establishment that the goal of the entity is to take actions that help it avoid suffering.


That's when it gets dangerous, when we try to really recreate animal (human) characteristics in digital form. Combining that with likely 1000-1000000 X increases in performance leads to these superintelligent digital creatures taking over.

Instead, we can focus on the Star Trek computer type stuff that we have with GPT and be incredibly careful about deploying those more animal/humanlike models and higher performance compute. Especially if we deliberately create the next species in digital form, make it 100X or 10000X faster thinking/smarter than us, and then enslave it, that is not only totally stupid but also proven unnecessary by the generality the latest AI models.


I think an AI may have extra qualities by feeling suffering etc., but I don't think these extra qualities are rationally beneficial.


Check out David Shapiro's cognitive architecture stuff.


Saying “hey don’t go down the path we are on, where we are making money and considered the best in the world.. it’s a dead end” rings pretty hollow.. like “don’t take our lunch please?” Might be a similar statement it feels..


Nah - GPT-4 is crazy expensive, paying 20$/mo only get's you 25messages/3hours and it's crazy slow. The api is rather expensive too.

I'm pretty sure that GPT-4 is ~1T-2T parameters, and they're struggling to run it(at reasonable performance and profit). So far their strategy has been to 10x the parameter count every GPT generation, and the problem is that there's diminishing returns everytime they do that. AFAIK they've now resorted to chunking GPT through the GPUs because of the 2 to 4 terabytes of VRAM required (at 16bit).

So now they've reached the edge of what they can reasonably run, and even if they do 10x it the expected gains are less. On top of this, models like LLaMa have shown that it's possible to cut the parameter count substantially and still get decent results (albiet the opensource stuff still hasn't caught up).

On top of all of this, keep in mind that at 8bit resolution 175B parameters (GBPT3.5) requires over 175GB of VRAM. This is crazy expensive and would never fit on consumer devices. Even if you use quantization and use 4bit, you still need over 80GB of VRAM.

This definitely is not a "throw them off the trail" tactic - in order for this to actually scale the way everyone envisions both in performance and running on consumer devices - research HAS to be on improving the parameter count. And again there's lots of research showing its very possible to do.

tl;dr: smaller = cheaper+faster+more accessible+same performance


Yeah I am noticing this as well. GPT enables you to do difficult things really easily, but then it is so expensive you would need to replace it with custom code for any long term solution.

For example: you could use GPT to parse a resume file, pull out work experience and return it as JSON. That would take minutes to setup using the GPT API and it would take weeks to build your own system, but GPT is so expensive that building your own system is totally worth it.

Unless they can seriously reduce how expensive it is I don't see it replacing many existing solutions. Using GPT to parse text for a repetitive task is like using a backhoe to plant flowers.


> For example: you could use GPT to parse a resume file, pull out work experience and return it as JSON. That would take minutes to setup using the GPT API and it would take weeks to build your own system, but GPT is so expensive that building your own system is totally worth it.

True, but an HR SaaS vendor could use that to put on a compelling demo to a potential customer, stopping them from going to a competitor or otherwise benefiting.

And anyway, without churning the numbers, for volumes of say 1M resumes (at which point you've achieved a lot of success) I can't quite believe it would be cheaper to build something when there is such a powerful solution available. Maybe once you are at 1G resumes... My bet is still no though.


I work for a company with the web development team. We have ~6 software developers.

I'd love to be able to just have people submit their resume's and extract the data from there, but instead I'm going to build a form and make applicants fill it out because chatGPT is going to be at least $0.05USD depending on the length of the resume.

I'd also love to have mini summeries of order returns summerized in human form, but that also would cost 0.05USD per form.

the tl;dr here is that there's a TON of usecases for a LLM outside of your core product (we sell clothes) - but we can't currently justify that cost. Compare that to the rapidly improving self-hosted solutions which don't cost 0.05USD for literally any query (and likely more for anything useful).


5 cents. Per resume. $500 per 10k. 1-3 hours of a fully loaded engineers salary per year. You are being criminally cheap.


The problem is that it would take us the same amount of time to just add a form with django. Plus you have to handle failure cases, etc.

And yeah I agree this would be a great use-case, and isn't that expensive.

I'd like to do this in lots of places, and the problem is I have to convince my boss to pay for something that otherwise would have been free.

The conversation would be "We have to add these fields to our model, and we either tell django to add a form for them, which will have 0 ongoing cost and no reliance on a third party,

or we send the resume to openai, pay for them to process it, make some mechanism to sanity check what GPT is responding with, alert us if there's issues, and then put it into that model, and pay 5 cents per resume."

> 1-3 hours of a fully loaded engineers salary per year.

That's assuming 0 time to implement, and because of our framework it would take more hours to implement the openai solution (that's also more like 12 hours where we are).

> $500 per 10k.

I can't stress this enough - the alternative is 0$ per 10k. My boss wants to know why we would pay any money for a less reliable solution (GPT serialization is not nearly as reliable as a standard django form).

I think within the next few years we'll be able to run the model locally and throw dozens of tasks just like this at the LLM, just not yet.


There are excellent commercial AI resume parsers already - Affinda.com being one. Not expensive and takes minutes to implement.


For a big company that is nothing but if you are bootstrapping and trying to acquire customers with an MVP racking up a $500 bill is frightening. What if you offer a free trial and blow up and end up with 5k+ bill.


By these maths, the $500 bill is for 10K resumes.

To show an MVP to a customer you only need 10 resumes (or 1 in most demos I've been in).

So 50c.


Also you could likely use GPT3.5 for this and still get near perfect results.


> near perfect results.

I have tried GPT3.5 and GPT4 for this type of task - the "near perfect results" is really problematic because you need to verify that it's likely correct, notify you if there's issues, and even then you aren't 100% sure that it selected the correct first/last name.

This is compared to a standard html form. Which is.... very reliable and (for us) automatically has error handling built in, including alerts to us if there's a 504.


You could use those examples to finetune a model only for resume-data extraction.


I don't think this argument really holds up.

GPT3 on release was more expensive ($0.06/1000 tokens vs $0.03 input and $0.06 output for GPT4).

Reasonable to assume that in 1-2 years it will also come down in cost.


> Reasonable to assume that in 1-2 years it will also come down in cost.

Definitely. I'm guessing they used something like quantization to optimize the vram usage to 4bit. The thing is that if you can't fit the weights in memory then you have to chunk it and that's slow = more gpu time = more cost. And even if you can fit it in GPU memory, less memory = less gpus needed.

But we know you _can_ use less parameters, and that the training data + RLHF makes a massive difference in quality. And the model size linearly relates to the VRAM requirements/cost.

So if you can get a 60B model to run at 175B's quality, then you've almost 1/3rd your memory requirements, and can now run (with 4bit quantization) on a single A100 80GB which is 1/8th the previously known 8x A100's that GPT-3.5 ran on (and still half GPT-3.5+4bit).

Also while openai likely doesn't want this - we really want these models to run on our devices, and LLaMa+finetuning has shown promising improvements (not their just yet) at 7B size which can run on consumer devices.


It's never been in OpenAIs interest to make their model affordable or fast, they're actually incentivized to do the opposite as an excuse to keep the tech locked up.

This is why Dall-e 2 ran in a data centre and Stable Diffusion runs on a gamer GPU


I think you're mixing the two. They do have an incentive to make it affordable and fast because that increases the use cases for it, and the faster it is the cheaper it is for them, because the expense is compute time (half the time ~= half the cost).

> This is why Dall-e 2 ran in a data centre and Stable Diffusion runs on a gamer GPU

This is absolutely why they're keeping it locked up. By simply not releasing the weights, you can't run Dalle2 locally, and yeah they don't want to do this because they want you to be locked to their platform, not running it for free locally.


It's a pretty sus argument for sure when they're scared to release even parameter size.

although the title is a bit misleading on what he was actually saying. still, there's a lot left to go in terms of scale. Even if it isn't parameter size(and there's still lots of room here too, it just won't be economical), contrary to popular belief, there's lots of data left to mine


Everyone hoping to compete with OpenAI should have an "Always do the opposite of what Sam says" sign on the wall.


Ok cool, so release the weights and your research.


Ilya Sutskever from OpenAI saying that the data situation is good and there's more data to train on - https://youtu.be/Yf1o0TQzry8?t=657


Maybe more data to train on while keeping model size the same


Sam Altman and OpenAI must be pretty nervous. They have first mover advantage but they hold no hook or moat.

Unless they can somehow keep their improvements ahead of the rest of the industry then they'll be lost among a crowd.


Yeah. Sadly I believe they may lose their spot due to their concern for safety. They got here by pushing the envelope. But now with everyone freaking out, he's not putting off the training of a new model due to 'safety issues' (with plugins i suspect)


On the other hand, if Microsoft can change enough people's habits of using Google to search, they'll be happy as a clam.


Is anyone aware of techniques to prune a model from useless knowledge to leave more space for the reasoning capabilities?

It really shouldn't matter that it can give the exact birthdate of Steve Wozniac, as long as it can properly make a query to fetch it and deal with the result.


I follow your design, couldn't you also solve hallucinations with a "fact checking" LLM (connected to search) that corrects the output of the core LLM? You would take the output of the core LLM, send it to the fact checker with a prompt like "evaluate this output for any potential false statements, and perform an internet search to validate and correct them"


You can't trust what sama says. Period. He's shown his willingness to deceive for gain. He deleted the codex model because it was better than the text model and only way to explain it is that by training on code instead of just text of human interactions the model develops stronger reasoning abilities (because of higher clarity around logic and higher signal to noise overall in the training data.) So he removed that so researchers don't catch on. Also, why 1 Trillion params is the limit? Why not 2 Trillion. He's been begging for regulations and trying to pull up the ladder behind him. Just really "street smart" thinking but I have not seen any elevated humanity serving thinking coming form him. He would not have taken $100M for a non-profit to do open source research only to bait-and-switch into for profit closed source sold to the highest bidder. Give me a break. Don't believe anything he says (well, use your brain, as he may mix truths and deceptions with the goal of deceiving his competitors... he's not out there to do a service for humanity... if he was, he would have kept it open and non-profit, not basically steal the funds of the non-profit to pocket $29B from it.) It's ridiclous.


The role of a CEO is more to benefit the company than the public. Only time will tell.

I am curious though how something like Moore's Law relates to this. Yes, model architectures will deal with complexity better and the amount of data helps as well. There must be a relation between technology innovation and cost which alludes to effectiveness. Innovation in computation, model architecture, quality of data, etc.


The 2017 Transformers paper has ~71,000 papers citing it. The sheer magnitude of human mental effort globally that is chasing the forefront of machine learning is unprecedented and amazing.


Local minimum. LeCun is right.


"When we set the upper limit of PC-DOS at 640K, we thought nobody would ever need that much memory."

Bill Gates


Gates has refuted saying this. Are you implying by analogy that Altman hasn't said/will disclaim saying that "the age of giant AI models is almost over"?


Just that there is tremendous hubris in the statement—at least when the statement stands alone. Vastly larger LLMs will probably become one or more relatively small components or layers of much larger systems that run on whatever we use as telephones in 20 years.

I suspect what he means is that OpenAI is finding diminishing returns from throwing money and hardware at larger models right now and that they are investigating other and/or composite AI techniques that make more optimal use of their hardware investment.


That was in context of a how much RAM would be needed for software of that era, not the entire future of computing.


Interesting how this contradicts "The Bitter Lesson": http://incompleteideas.net/IncIdeas/BitterLesson.html.


I don't think there is a contradiction at all. Altman is essentially saying they are running out of compute and therefore can't meaningfully scale further. Not that scaling further would be a worse plan longterm than coming up with new algorithms.


There's plenty of room for models to continue to grow once efficiency is improved. The basic premise of the Google ML pathways project is sound, you don't have to use all the model all the time. By moving to sparse activations or sparse architectures you can do a lot more with the same compute. The effective model size might be 10x or 100x GPT-4 (speculated at 1T params) but require comparable or less compute.

While not a perfect analogy it's useful to remember that the human brain has far more "parameters", requires several orders of magnitude less energy to train and run, is highly sparse, and does a decent job at thinking.


The efficiency thing is what worries me. I think Nvidia has rough ideas for increasing efficiency 100 - 1000 times without changing the fundamental paradigm (i.e. memristors or crossbar arrays or something). If it doesn't go to 1000, I assume there is a lot of investment ramping up for realizing the new fully compute-in-memory systems.

If the software can leverage these efficiency gains effectively, then the concerns about runaway AI will be very relevant. Especially since people seem to think that they need to emulate all animal (like human) characteristics to get "real" general intelligence. Despite the fact that GPT is clearly general purpose. And people make no real differentiation between the most dangerous types of characteristics like self-preservation or full autonomy.

GPT shows that we can have something like a Star Trek computer without creating Data. People should really stop rushing their plans to create an army of Datas and then enslave them. Totally unnecessary and stupid.


I believe Altman, but the title is misleading.

Have we exhausted the value of larger models on current architecture? Probably yes. I trust OpenAI would throw more $ at it if there was anything left on the table.

Have we been here before? Also yes. I recall hearing similar things about LSTMs when they were in vogue.

Will the next game changing architecture require a huge model? Probably. Don’t see any sign these things are scaling _worse_ with more data/compute.

The age of huge models with current architecture could be over, but that started what, 5 years ago? Who cares?


Also scaling doesn't address some of the challenges for ai that chatGPT doesn't meet, like:

- learning to learn, aka continual learning - internalised memory

bringing it closer to actual human capabilities.


The age of CEOs that recently got washed to the top saying dumbish things is just starting, though.


We arrived at the top of the tree in our journey to the moon.


No we haven't. the title is misleading. there's plenty of scale room left. part of it might just not be economical (parameter sie) but there's data. If you take this to mean, "we're at a dead end" you'd be very wrong


"You can't get to the moon by climbing successively taller trees"


It's probably economically difficult to rationalize building a GPT5. They're already having a hard time getting everyone GPT4. Better to spend the resources optimizing to get everyone onto a cheap(er) GPT4-comparable model than build a marginally better model that is even more expensive to scale. Like most tech there is a sigmoid curve where you start to see diminishing returns.


LLM's will be treated much like occupations for people going forward. Just like our own education, there will be a basic set of skills all LLM's need to be trained on to provide a minimal useable product. With GPT4 I suspect we are at this point now if not slightly beyond the optimal basic training stage. However, now it is time to send our LLM's off to university so to speak, to specialise them in certain tasks and knowledge base. The current approach of attempting to just build everything in to one does not work for a human mind, so why would it work for a vector based replication?

I suspect in the coming months we will hear more about tiny models trained on much smaller datasets and then specialised using a mix of adaptors and LoRA modifications to excel at specific tasks like code generation, translation, and conversation. Then multiple models will be implemented in one application chain to best leverage each of the respective strengths.


What age? Like, 3 years?

On the other hand though, Chinchilla and multimodal approaches already showed how later AIs can be improved beyond throwing petabytes of data at them.

It is all about variety and quality from now on I think. You can teach a person all about the color zyra but without actually ever seeing it, they will never fully understand that color.


It does seem, though, that using chinchilla like techniques does not create a copy with the same quality as the original. It's pretty good for some definition of the phrase, but it isn't equivalent, it's a lossy technique.


I agree on the lossy. There is a tradeoff between efficiency and comprehensiveness, kind of. It would be pretty funny if in the end, the most optimal method turns out to be the brain we already have. Extremely efficient, hardware optimized, but slow as hell and misunderstand stuff all the time unless prompted with specific phrases.


Something kind of funny (but mostly annoying), about this announcement is the people arguing that OpenAI is, in fact, working on GPT-5 in secret.

To my knowledge, NFT/crypto hype never got so bad that conspiracy theories began to circulate (though I’m sure there were some if you looked hard enough).

Can’t wait for an AIAnon community to emerge.


Isn’t it obvious? Q is definitely an LLM, trained on trillions of words exfiltrated from our nation’s secure systems. This explains why it’s always wrong in its predictions: it’s hallucinating!


Yeah I think what's happened to OpenAI is the same happening to all tech companies. They were spending a lot of money building these LLM's and now they've been told to cut back costs they no longer have the funding to build the next "double the size of the last one" model.

Interestingly what's makes ChatGPT work is the size of the model so I think they've found their dead stop.

Lot's of the bullish comments have been talking about how ChatGPT is a bit shit right now but will exponentially get better and I think the answer is now the progress will be much slower and more linear. That is if they can stay funded which is a very big if. As an org they are bleeding money.


"Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them."

An H100 has 80GB of VRAM. The Highest end system I can find is 8xH100. Is a 640GB model is the biggest model you can run on a single system? Already GPT-4 is throttled and has a waiting list and they haven't even released the image processing or integrations to a wide audience. Maybe they are just unable to do inference in a cost-efficient manner and at an acceptable speed on anything bigger than GPT-4?


You might not see it available via self serve interfaces, but I think the h100 cards can do 256x to a single host.


You're right! You can nvlink 32 systems with 8 h100s! That's 20 terabytes of VRAM total!


>Maybe they are just unable to do inference in a cost-efficient manner and at an acceptable speed on anything bigger than GPT-4?

Yes, that's the whole thing. As others have pointed out, GPT-4 seems like an optimum point that balances cost, ROI, etc... it won't improve much just by throwing more data at it.


Sorry, but this sounds a lot like 640KB is all the memory you will ever need. What about “Socratic model” for video? There should me many applications that would benefit from a bigger model


This is just push back from elon and crews fake article about the dangers of AI, they specifically state the next versions will be ultra deadly.

Sam is now saying there will be no future model that will be as good.

This is all positioning to get regulators off the track because none of these control freaks in government actually understand a whit of this.

All said and done, this all just to try to disempower the OSS community. But they can't, we're blowing past their barriers like the 90s did with the definition of slippery slope.


He isn't saying that. Only that different approaches are needed to produce better results. If he's confident to say this I think he must have something up his sleeve.


Bruv has to pay for the data he's been using or soon there won't be any to nick on. Groupies claiming their ai is "intelligent", and not just a data ingesting beast, will soon learn a heard lesson. Take your blogs offline, stop contributing content for free and stop pushing code or else chavs like this one will continue monetising your hard work. As did bezos and many others that now want you to be out of a job.


I'm no expert but doesn't the architecture of minigpt4 that's on the front page right now give some indication of what the future might look like?


eh, I haven't personally found a usecase for LLMs yet given the fact that you can't trust the output and it needs to be verified by a human (which might as well be just as time consuming/expensive as actually doing the task yourself)


I’d reconsider the “might as well just be as time consuming” thing. I see this argument about Copilot a lot, and it’s really wrong there, so it might be wrong here too.

Like, for most of the time I’m using it, Copilot saves me 30 seconds here and there and it takes me about a second to look at the line or two of code and go “yeah, that’s right”. It adds up, especially when I’m working with an unfamiliar language and forget which Collection type I’m going to need or something.


> Like, for most of the time I’m using it, Copilot saves me 30 seconds here and there and it takes me about a second to look at the line or two of code and go “yeah, that’s right”.

I've never used Copilot but I've tried to replace StackOverflow with ChatGPT. The difference is, the StackOverflow responses compile/are right. The ChatGPT responses will make up an API that doesn't exist. Major setback.


Thing is, you can't trust what you find on stack overflow or other sources either. And searching, reading documentation and so on takes a lot of time too.

I've personally been using it to explore using different libraries to produce charts. I managed to try out about 5 different libraries in a day with fairly advanced options for each using chatGPT.

I might have spent a day in the past just trying one and not to the same level of functionality.

So while it still took me a day, my final code was much better fitted to my problem with increased functionality. Not a time saver then for me but a quality enhancer and I learned a lot more too.


> Thing is, you can't trust what you find on stack overflow or other sources either.

Eh. An outdated answer will be called out in the comments/downvoted/updated/edited more often than not, no?


Maybe, maybe not. I get useful results from it, but it doesnt always work. And it's usually not quite what I'm looking for, so then I have to go digging around to find out how to tweak it. It all takes time and you do not get a working solution out of the box most of the time.


They're good for tasks where generation is hard but verification is easy. Things like "here I gesture at a vague concept that I don't know the name of, please tell me what the industry-standard term for this thing is" where figuring out the term is hard but looking up a term to see what it means is easy. "Create an accurate summary of this article" is another example - reading the article and the summary and verifying that they match may be easier than writing the summary yourself.


I've enjoyed using it for very small automation tasks. For instance, it helped me write scripts to take all my audiobooks with poor recording quality, split them into 59-minute chunks, and upload them to Adobe's free audio enhancement site to vastly improve the listening experience.


No? I use it all the time to help me, for example, read ML threads when I run into a term I don't immediately understand. I can do things like 'explain this at the level of a high school student'


the way i see it, the expensive part should be to train the models via simulated architectures in GPUs or TPUs or whatever.

but once they are trained, is there a way to encode the base models into hardware where inference costs are basically negligible? hopefully somebody is seeing if this is possible, using structurally encoded hardware to make inference costs basically nil/constant.


I don't understand why size is an issue in the way that is being claimed here.

Intelligence isn't like processor speed. If I have a model that has (excuse the attempt at a comparison) 200 IQ, why would it matter that it runs more slowly than a human?

I don't think that, for example, Feynman at half speed would have had substantially fewer insights.


We're not going to get a 200 IQ model by simply scaling up the current model, even with all the datacenters in the world running 24/7


I didn't think this article was very good. Sam Altman actually implied that GPT-5 will be developed when he spoke at MIT. And if Sam said that scaling is over (I doubt he said this but I could be wrong) the interesting part would be the reasoning he provided for this statement - no mention of that in the article.


All warfare is based on deception -- Sun Zu


Wait till some one pulls a Cerberus for MEMORY. And than we connect a wafer scale processor with wafer scale memory and BOOM. New Mega size works emerge.

As for current claim, it might have to do with the amount of time spent taming the wildness of answer by raw GPT-4. So focus is shifting from increasing the model size.


"...for the current cycle, in our specific public-facing market."

As most here well know "over" is one of those words like "never" which particularly in this space should pretty much always be understood as implicitly accompanied by a footnote backtracking to include near-term scope.


An amusing thought I've had recently is whether LLMs are in the same league as the millions of monkeys at the keyboard, struggling to reproduce one of the complete works of William Shakespeare.

But I think not, since monkeys probably don't "improve" noticeably with time or input.


But I think not, since monkeys probably don't "improve" noticeably with time or input.

Maybe once tons of bananas are introduced...


He's finally catching up with me. I wrote an article about that a few months ago.

https://twitter.com/igorcosta/status/1612704076315389952


I wonder how much the scarcity and cost of Nvidia GPUs is driving this message?

Nvidia is in a perfect "Arms Dealer" situation right now.

Wouldn't be surprised to see the next exponential leap in AI models trained on in-house proprietary GPU hardware architectures.


Google has been using TPUs for years and continuously improving the designs.


dudes. if none of you see it, it's because you're turning a blind eye, is the best way I can say it. the future of the 50's is about to show up, if, and only if, we're not so dumb as to ignore it. flying cars, killer robots, we can help decide, if we're only aware enough to do a bit of research, and say anything. your voice does matter :D

watch this: https://www.youtube.com/watch?v=VcVfceTsD0A&ab_channel=LexFr...

then, think, and then, think again, and then comment be optimistic, but be forewarned, and speak your voice


Maybe they're now training teams of AIs to work together? This way the hardware limitations are less of a barrier in terms of maximum memory per AI. They can run in parallel in separate hardware and communicate to reach higher goals.


Someone described me a emerging rule called “Penelope‘s rule” where you scale by having the largest and most powerful model delegate to the less powerful models and then have the larger model improve upon the results. (Eg have GPT-4 delegate the more mundane tasks to GPT-3.5).


Hm, all right, I'm guessing that huge models as a business maybe are over until economics are figured out, but huge models as experts for knowledge distillation seems reasonable. And if you pay a super premium can you use huge model.


GPT4 already obsolete? Geez, we got the wrong bot, silly heads: https://www.youtube.com/watch?v=EdRMVhlNP5I


It’s a good things there bots don’t have feelings yet because there’s going to be an lot obsolete bots once we’re done and getting made obsolete isn’t very fun.


I call bullshit. There will be bigger and better models. The question is not whether big companies will invest in training them (they will), but whether they’ll be made available to the public.


Those stochastic parrots are done for, finally! I have been telling people AI is a lie and it’s all a scam and now you can see for yourself. Buy nice warm jackets guys, winter is coming.


Sounds like we will be on gpt4 for a while, while we wait out a new “AI winter”, mind you there is still a lot to squeeze from gpt4 and local/edge llm models


it's over, billions of parameters must be released


Soon, the EU will pass a law that forces any AI company to provide you with a printout of the entire model upon request.


Hmmmmmm I know nothing about this but… guy who spends a bunch of money on thing to sell says others shouldn’t do it.

Sounds sus

Sorry I know this adds little to the conversation


There will still be giant AI models, they will just be improved by other means than making them bigger. Subtle but important distinction.


small AI model != cheap AI model.

It costs the same to train as these giant models. You merely spend they money on training it for longer instead of larger.


The most comforting AI news I have read this year.


If you are worried about AI, this shouldn't make you feel a ton better. GPT4 is just trained to predict the next word, a very simple but crude approach and look what it can do!

Imagine when a dozen models are wired together and giving each other feedback with more clever training and algorithms on future faster hardware.

It is still going to get wild


Machine learning is actually premised on being “simple” to implement. The more priors you hardcode with clever algorithms, the closer you get to what we already have. The point is to automate the process of learning. We do this now with relatively simple loss functions and models containing relatively simple parameters. The main stipulation is that they are all defined to be continuous so that you can use the chain rule from calculus to calculate the error with respect to every parameter without taking so long that it would never finish.

I agree that your suggested approach of applying cleverness to what we have now will probably produce better results. But that’s not going to stop better architectures, hardware and even entire regimes from being developed until we approach AGI.

My suspicion is that there’s still a few breakthroughs waiting to be made. I also suspect that sufficiently advanced models will make such breakthroughs easier to discover.


I have repeatedly argued against this notion of „just predicting the next word“. No. It‘s completing a conversation. It‘s true that it is doing this word by word, but it‘s kind of like saying a CNN is just predicting a label. Sure, but how? It‘s not doing it directly. It‘s doing it by recovering a lot of structure and in the end boiling that down to a label. Likewise a network trained to predict the next word may very well have worked out the whole sentence (implicitly, not as a text) in order to generate the next word.


I actually have high hopes for the hybrid architecture Ben Goertzel has been working on at OpenCog. I think the LLMs are soon going to hit a S curve w/o introduction of additional scientific knowledge like physics and notion of energy (wrt AGI development, they'll still be good for tonnes of other jobs displacing things).


I worry that the hardware requirements are only going to accelerate the cloud-OS integration. Imagine a PC that's entirely unusable offline.


> Imagine a PC that's entirely unusable offline.

FWIW we had thin clients in computer labs in middle school / high school 15 years ago (and still today these are common in enterprise environments, e.g. Citrix).

Biggest issue is network latency which is limited by the speed of light, so I imagine if computers in 10 years require resources not available locally it would likely be a local/cloud hybrid model.


Personally, I'm less worried about AI than I am about what people using these models can do to others. Misinformation/disinformation, more believable scams, stuff like that.


> Imagine when a dozen models are wired together...

Wouldn't these models hallucinate more than normal, then?


People think something magical happens when AI are wired together and give each other feedback.

Really you’re still just predicting the next word, but with extra steps.


People think that something magical happens when transistors are wired together and give each other feedback.

Really you're just switching switches on and off, but with extra steps.


I agree, probably for a different reason. IMO the fact that Sam is saying this likely means that the LLMs are close to the upper knee of the S-Curve and after GPT5 they likely dont have many new fundamental ideas (additional SCurves) to throw at it. since they are ahead of the rest, it likely means we might be headed for an AI improvement pause for couple of years after GPT5.

Still good enough to upset the balance in search/ad market. Interesting times.


> it likely means we might be headed for an AI improvement pause for couple of years after GPT5.

I suspect that a pause in base LLM performance won’t be an AI improvement pause; there’s a whole lot of space to improve the parts of AI systems around the core “brain in a jar” model.


I agree, there will be other things to be improved in AI system, but IMHO (tea leaves reading really) it would only lead to incremental improvements in overall systems. Also there is a lot of 'interfacing' work that needs to happen & i suspect that would end up filling the pause, which really is LLM productization loosely speaking.

far as AGI is concerned I dont believe LLMs are really the right architecture for it, AGI likely needs some symbolic logic and a notion of physicality (ie.. physical laws & energy/power).


> but IMHO (tea leaves reading really) it would only lead to incremental improvements in overall systems.

It will reach a point where that is the case, sure; it is not there now, and if we are within one model generation of exhausting (for now) major core model improvements, I don’t think we’ll have reached the point of gradual incremental improvement from rest-of-system improvements yet.


Title is misleading lol. Plenty of scale room left.


To those in the know, what are the potential successors to the Transformer architecture?



"640K of memory is all that anybody with a computer would ever need"


No one will ever need more than 64k moment or whatever it was


42

train the entire model for a single question.


Now we need another letter


dude someone lied their way into being president of the united states all while people fact checked him basically immediately after each lie. lying doesnt make a difference.


"Eschew flamebait. Avoid generic tangents."

https://news.ycombinator.com/newsguidelines.html

We detached this subthread from https://news.ycombinator.com/item?id=35606550.


He’s not presenting false evidence here, he’s presenting a hunch. It’s a guess. No one is going to gain anything from this one way or another.


> No one is going to gain anything from this one way or another.

Then why did Altman even bother to make this very public statement?


He was a guest on an AI-related discussion panel [1]. I did not attend, but I would guess someone asked him “do you think models will have to continuing growing to even larger sizes,” or something like that, and this was his answer.

Honestly, that anyone thinks he has “ulterior motives” here is hilarious to me. Every day I think more and more that we can no longer think critically on the internet.

[1] https://www.imaginationinaction.co/


CEOs often make public statements for recruiting and to steer the company internally, it's important to mimmic what you say inside the company outside the company, maybe even more so.


Altman's made it clear he does not mind being wrong.

Smart people are allowed to make mistakes.


To answer the people asking when GPT5 is coming out, probably.


He is using his authority to put an idea in the minds of others. That idea is:

"don't bother competing with me. i have already won."


[flagged]


It's usually a good thing to keep recent presidents and near future presidential candidates in memory.

This isn't about some dumb grudge.


Some of us prefer to surf the internet without reddit leaking everywhere.


"rent free" is reddit leaking much more than bringing up a politician for a complaint.


So your response is "no u"

Ok, cool.


A little bit.

It's not like an accusation of hypocrisy is automatically wrong, or anything.

It really confuses me why you think such a mundane political statement is a reddit thing. Especially if you think the rent free thing isn't reddity, in which case you're not drawing with a super broad brush in a way I could at least understand.


[flagged]


Please don't take HN threads further into flamewar. It's not what this site is for, and destroys what it is for.

https://news.ycombinator.com/newsguidelines.html

Edit: Could you please stop posting unsubstantive comments and flamebait generally? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for. We ban accounts that keep posting like this:

https://news.ycombinator.com/item?id=35608465

https://news.ycombinator.com/item?id=35599412

https://news.ycombinator.com/item?id=35587730


So your embrace of nihilism is total. Here are a couple of books worth reading:

- On Tyranny: https://bookshop.org/p/books/on-tyranny-twenty-lessons-from-...

- The Road to Unfreedom: https://bookshop.org/p/books/the-road-to-unfreedom-russia-eu...


Please don't take HN threads further into flamewar. It's not what this site is for, and destroys what it is for.

https://news.ycombinator.com/newsguidelines.html


[flagged]


Please don't take HN threads further into flamewar. It's not what this site is for, and destroys what it is for.

https://news.ycombinator.com/newsguidelines.html


You're confused about who's who.


These kind of comments have me update my priors so I downgrade them in relevance. Two days ago we were talking about AGI and Altman was letting himself be cuddled, knowing full well it was mindless hype. Now he's trolling the other way. Ok. My weights have been updated.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: