OpenAI’s CEO says the age of giant AI models is already over

summerlight · on April 17, 2023

The point is that now we're at the point of diminishing return for increasing model size, unless we find a better modeling architecture than Transformer.

I think this is likely true; while all the other companies underestimated the capability of transformer (including Google itself!), OpenAI made a fairly accurate bet on the transformer based on the scaling law, put all the efforts to squeeze it until the last drop and took all the rewards.

It's likely that GPT-4 is on the optimal spot between cost and performance and there won't be significant improvements on performance in a near future. I guess the next task would be more on efficiency, which has a significant implication on its productionization.

worrycue · on April 18, 2023

> (including Google itself!)

Bet Google won’t make that mistake again, i.e. it won’t publish as much and will be much more careful about what it publishes, least they give a competitor a useful tool and get nothing in return - when the competitor (in this case very ironically named) goes full commercial and close source everything they can.

Open collaboration in AI, at least when it comes to corporations, might have come to an end.

hackernewds · on April 18, 2023

The scale of the damage Open AI has done to the trust ecosystem with soliciting not just the work, but also massive fundraising and then privatizing the profits is almost unprecedented and permanent.

imposter · on April 18, 2023

> trust ecosystem

What trust ecosystem are you talking about ? It was a lack of foresight by google on their own discovery of transformers, and it would probably have been sitting in dust or been killed off by the time it would have taken them to reach GPT-2 level of progress.

anthlax · on April 18, 2023

The trust ecosystem of the AI community - most every breakthrough was published publicly until OpenAI decided to take advantage of that.

Besides that, this comment contained a ton of statements on what “would” have happened had Google not published. Interesting but worthless way to defend openAI’s actions.

woah · on April 19, 2023

Seems to me that it's very hard to have a moat in LLMs without proprietary code, given that most of their training data is freely available. This is very different form the 2010's era of AI, where they were trained on large amounts of proprietary data that was specific to a given service and could not be replicated.

It's a lot easier to pat yourself on the back for releasing a paper about your techniques when your competitors can't replicate your service with it. I think that as generative AI models move past the hype phase into the competitive phase, they will be keeping a lot of innovation proprietary for at least a few years to maintain an edge over their competitors.

Let's just hope they don't move to patenting everything.

YeBanKo · on April 20, 2023

The point of OpenAI was that no single company would have a moat around LLM or foundation models in general. It was set as a non profit with this goal in mind and got money for it.

Whether Google pat itself on the back or not for releasing the paper no one could replicate, is not important, because an open research had never been their company’s goal. What happened is a for-profit company released a paper, that allowed a huge advantage to a company, whose mission was to ensure that no one has a huge advantage in the field. OpenAI was converted to for profit and established exclusive relationship with Microsoft.

Google fucked up and missed the train, but they can catch up. Much harder for smaller companies if as a result of this Fb, Google, etc AI research dept lock down their papers to tools to internal use only.

barfingclouds · on April 18, 2023

I suspect this is similar to Xerox with the graphical user interface

urthor · on April 18, 2023

Sun Tzu.

When you are weak pretend to be strong.

When you are strong, pretend to be weak.

RecycledEle · on April 19, 2023

My thought exactly. I assume this is to reduce the competition.

potamic · on April 18, 2023

Doesn't Google have any patents on the transformer architecture? I assume large enterprises tend to patent everything that comes out of their research.

chaoz_ · on April 18, 2023

https://patents.google.com/patent/US10452978B2/en

GPT models are based on transformer, but architecture is different from what's patented.

Not a lawyer, but can you really patent certain network architecture? Theoretically someone could invent new activation function that just happens to make same architecture perform a lot better on some tasks, can you call really cover that with patent?

fauxpause_ · on April 17, 2023

Seems like a wild claim to make without any examples of gpt models which are bigger and no demonstrably better.

bhouston · on April 18, 2023

I suspect you are right. We may be stuck at the gpt4 sizes for a bit just because of hardware costs though. As they get bigger it costs too much to run them until our hardware becomes more optimal for these large models at 4 bits or so.

I think the YouTube videos is going to be the next big training set. A transformer trained on all text and all of YouTube will be killer amazing at so much. I bet it can understand locomotion and balance and body control from YouTube.

I wonder if TPUs, like Google's Tensor chip, will beat out GPUs when it comes to image/video based training?

blibble · on April 18, 2023

> A transformer trained on all text and all of YouTube will be killer amazing at so much.

ah yes, a bot where the answer to everything is to buy ridge wallets and play raid shadow legends

heavyset_go · on April 18, 2023

It took a year's worth of a small nation's energy expenditure to train and all it does is tell you how Raycon earbuds will change your life.

hn_go_brrrrr · on April 18, 2023

I have never been so pleased to not get any of these references.

okwhateverdude · on April 18, 2023

NewPipe/Freetube/Invidious instances + SponsorBlock API support is really great at filtering all of these useless advertising memes. Somehow ($$$) the behavior of TV adverts culture seeped into YouTube. I've not watched broadcast/cable TV in years, but when I interact with people that do, inevitably, they make similar references. It's super weird.

nerdix · on April 18, 2023

Wait...

You can auto skip the in-video sponsor ads?

okwhateverdude · on April 18, 2023

Yeah, SponsorBlock (https://sponsor.ajay.app/) is crowdsourced data for the in-video ads. Weirdly enough, I've never contributed timestamps, but the vast majority of the content I watch has submitted timestamps. I support a bunch of creators on patreon, but very much disagree with their double dipping.

jaidhyani · on April 18, 2023

GPT-LikeSubscribeAndRingThatBell

travisjungroth · on April 18, 2023

Whisper will frequently turn an empty audio file into “thanks for watching”.

jaidhyani · on April 20, 2023

This is a weird future.

swyx · on April 18, 2023

inaccurate, how will I watch Netflix in foreign countries without Nord VPN?

spaceman_2020 · on April 18, 2023

Your AI overlords, brought to you by NordVPN and HelloFresh.

cachvico · on April 18, 2023

They'll give it the premium tier at least

SimbaOnSteroids · on April 18, 2023

Premium won't save you from Raid Shadow Legends. I don't know how but they'll put ads in your dreams if you're not careful.

snowe2010 · on April 18, 2023

They need to make sure to install sponsorblock for gpt and then it should be fine.

SequoiaHope · on April 18, 2023

> I wonder if TPUs, like Google's Tensor chip, will beat out GPUs when it comes to image/video based training?

One of the OpenAI guys was talking about this. He said the specific technology does not matter, it is just a cost line item. They don't need to have the best chip tech available as long as they have enough money.

That said I am curious if anyone else can really comment on this. It seems like as we get to very large and expensive models we will produce more and more specialized technology.

daydream · on April 18, 2023

Whether or not cost matters much depends on your perspective.

If you’re OpenAI and GPT4 is just a step on the way to AGI, and you can amortize that huge cost over the hundreds of millions in revenue you’re gonna pull in from subscriptions and API use… then sure you’re probably not very cost sensitive. It could be 20% cheaper or 50% more expensive, whatever, it’s so good your customers will use it at a wide range of costs. And you have truckloads of money from Microsoft anyways.

If you’re a company or a developer trying to build a feature, whole new product, or an entire company on top of GPT then that cost matters a whole lot. The difference between $0.06 and $0.006 per turn could be infeasible vs. shippable.

If you’re trying to compete with OpenAI then you’re probably doing everything possible to reduce that training cost.

So, whether or not it matters - it really depends.

SequoiaHope · on April 18, 2023

Totally true.

bhouston · on April 18, 2023

> They don't need to have the best chip tech available as long as they have enough money.

That sounds like someone who is "Blitzscaling." Costs do not matter in those cases, just acquiring customers and marketshare. But for the rest of us, who will see benefits but are not trying to win a $100B market, we will cost optimize.

SequoiaHope · on April 18, 2023

Yes, agreed. I would like to run large models at home without serious expense.

nmfisher · on April 18, 2023

Maybe it's just a line item to them, but it's pretty relevant to anyone operating with a less-than-gargantuan budget. If a superior/affordable chip is widely available, OpenAI's competitive advantage recedes rapidly because suddenly everyone else can do what they can. To some extent that's exactly what happened with DALL-E/StableDiffusion.

assuming it's not horizontally scalable, because otherwise they would just out-spend everyone else anyway like they've already done. That's a big "if", though.

mnky9800n · on April 17, 2023

or like a curve of model complexity versus results or whatever showing it asymptotically approaches whatever.

actually there was a great paper from microsoft research from like 2001 on spam filtering where they demonstrated that model complexity necessary for spam filtering went down as the size of the data set went up. That paper, which i can't seem to find now, had a big impact on me as a researcher because it so clearly demonstrated that small data is usually bad data and sophisticated models are sometimes solving problems will small data sets instead of problems with data.

of course this paper came out the year friedman published his gradient boosting paper, i think random forest also was only recently published then as well (i think there is a paper from 1996 about RF and briemans two cultures paper came out this year where he discusses RF i believe), and this is a decade before gpu based neural networks. So times are different now. But actually i think the big difference is these days i probably ask chatgpt to write the boiler plate code for a gradient boosted model that takes data out of a relational database instead of writing it myself.

greatwave1 · on April 17, 2023

I believe this is the paper which you are referring to: https://aclanthology.org/P01-1005.pdf

("Scaling to Very Very Large Corpora for Natural Language Disambiguation" by Michele Banko and Eric Brill, Microsoft Research, 2001)

mnky9800n · on April 18, 2023

omg i have been searching forever for this. THANK YOU.

nomel · on April 17, 2023

> model complexity necessary for spam filtering went down as the size of the data set went up

My naive conclusion in that this means there are still massive gains to be had, since, for example, something like ChatGPT is just text, and the phrase "a picture is worth a thousand words" seems incredibly accurate, from my perspective. There's an incredible amount of non-text data out there still. Especially technical data.

Is there any merit to this belief?

cjbprime · on April 17, 2023

GPT-4 is actually multi-modal, not text. ChatGPT does not yet expose image upsubmission to it. But it's part of how the model was trained already.

jacobr1 · on April 17, 2023

Yes. One of the frontiers of current research seems to be multi-modal models.

Ambix · on April 18, 2023

> "a picture is worth a thousand words"

and it might be opposite for the GPT models actually. it's just easier for humans to grasp the bunch of knowledge with one eyes sight, but usually most of useful information might be represented with just of bunch of words and machines are to scan through the millions of words in an instant.

throwaway2037 · on April 18, 2023

Excellent points in your post. You wrote:

    There's an incredible amount of non-text data out there still. Especially technical data.

"Especially technical data." What does this part mean? Initially, I thought you meant things like images and video, but now I am confused.

nomel · on April 19, 2023

Schematics (of any sort), block diagrams, general spatial awareness (including anything related to puzzle pieces/packing, like circuit layout), most physics problems involving force diagrams, anything mechanical, etc. The text representation of any of these is ludicrously more complex than simple images.

If you sit someone down, that works in one of these fields, you'll quickly see the limitations. It'll try to represent the concepts as text, with ascii art or some "attempt" at an ascii file format that can be used to draw, and its "reasoning" about these things is much more limited.

I think most people interacting with GPT are in a text-only (and especially programming) bubble.

sseagull · on April 18, 2023

They might mean numerical data like scientific simulation data, sensor data, polling data, statistics, etc.

mensetmanusman · on April 17, 2023

Better on which axis? Do you want an AI that takes one hour to respond to? Some would for certain fields, but getting something fast and cheap is going to be hard now that Moore’s law is over.

AlecSchueler · on April 17, 2023

Don't we all agree that GPT4 is "better" than GPT3? How are we evaluating that if the axis is such a mystery. Yeah maybe we can't quantify it like I can't tell you one writer is better than another in a quantitative but we can both still read their work and come to an understanding.

nwlieb · on April 17, 2023

The runtime is quadratic for a given context size, although it seems like there is some progress on this front https://gwern.net/note/attention

MichaelZuo · on April 18, 2023

Exponential scaling for a presumable GPT-5 suggests it's response time will be unusably long for the vast majority of use cases, and probably cost multiple dollars USD per query.

Not to mention there doesn't actually exist enough English text data in the world to even double GPT-4's training set.

epups · on April 18, 2023

Compute will also scale exponentially in coming years. The data source limitation seems to be a harder barrier, I think many companies are experimenting with AI generated content for training at this point.

MichaelZuo · on April 18, 2023

> Compute will also scale exponentially in coming years.

Cost per transistor scaling has already plateaued or perhaps even inverted with TSMC's latest and greatest.

And the new chips, even after 25 layers of EUV lithography, more than doubling the previous record, and an extra year of fine tuning, has total SRAM size scaling of -5% and logic scaling of -42%.

These are numbers verified by experienced semi people.

darkerside · on April 18, 2023

INSUFFICIENT DATA FOR MEANINGFUL ANSWER

https://users.ece.cmu.edu/~gamvrosi/thelastq.html

prenoob · on April 18, 2023

I wasnt aware that it was his favorite story, thanks for posting it

zirgs · on April 18, 2023

Yes - image generation AIs also take longer to generate higher resolution pictures.

I'd rather have a LLM that thinks a bit longer than a LLM that spits out wrong answers immediately.

fauxpause_ · on April 18, 2023

Idk. I’m not the one making the claim we’ve already nailed the optimal point though.

xipix · on April 17, 2023

Perhaps (a) there do exist bigger models that weren't better or (b) this model isn't better than somewhat smaller ones. Perhaps the CEO has seen diminishing returns.

fauxpause_ · on April 18, 2023

Sure. It is possible there is evidence that is not shared. It’s dumb to assume this is the case though.

summerlight · on April 17, 2023

https://twitter.com/SmokeAwayyy/status/1646670920214536193

Sam explicitly said that there won't be GPT-5 in the near future, which is pretty clear evidence unless he's blatantly lying in public speaking.

kjellsbells · on April 17, 2023

Well, "no GPT-5" isn't the same as saying "no new trained model", especially in the realm of marketing. Welcome to "GPT 2024" could be his next slogan.

thehumanmeat · on April 17, 2023

That is one AI CEO out of 10,000. Just because OpenAI may not be interested in a larger model in the short term doesn't mean nobody else won't pursue it.

not2b · on April 18, 2023

Others might pursue a smaller model that works as well as a larger model. If that can be done, whoever does it can very effectively compete on price/performance.

It seems that to assume otherwise (the only way to improve is to get bigger) is to assume that OpenAI already has found the optimal architecture. That seems unlikely.

imatworkyo · on April 18, 2023

Google recently said transformers wouldn't work ...

hackerlight · on April 17, 2023

It's not a wild claim when you have empirically well-validated scaling laws which make this very prediction.

fauxpause_ · on April 18, 2023

That’s a generous usage of the term “empirically well-validated” and “law”

kccqzy · on April 18, 2023

If OpenAI's CEO is making this claim, don't you think he has internal data backing up the claim?

sfilmeyer · on April 18, 2023

No. I don't always assume that just because a CEO makes a public statement they have internal data backing up the claim. Sometimes they do! Other times, they have data but are misinterpreting it or missing something, but it's impossible to tell if the data is just internal. Other times they're making a statement without data based on their personal beliefs. Other times, they don't even think the statement is true but are saying it for messaging, marketing, or communication reasons!

Like the previous commenter, I'd be much more confident an asymptote was reached if it was being demonstrated publicly.

zamnos · on April 18, 2023

Only OpenAI and its CEO know the full details on GPT-4's sizes so that's entirely possible. But since it's an internal secret, there's nothing compelling him to tell the truth. For all we know, he has internal data backing up the opposite of the claim but is making this claim so as to discourage potential competitors from spending the money training an even bigger and competitive ML model. Sending potential competitors off on a wide goose chase that, when pushed, he can just say "oh our internal data (that no one outside of a trusted few have seen) said otherwise". I have no idea if sama is such a person, but you must admit that the possibility exists.

worthless-trash · on April 18, 2023

You'd hope so, but unless people put their evidence in public, it could simply be a tool to manipulate the public's expectations or competitors behavior.

I'll get downvoted for this, apples previous CEO was consistently inaccurate about company innovation and performance numbers.

wetpaws · on April 18, 2023

He might or might not, this is a point of the claim.

blast · on April 18, 2023

OpenAI may have those internally though.

Barrin92 · on April 17, 2023

Both ChatGPT 3.5 and 4 literally fail the question:

"What is the third letter in the third word of this sentence"

When you've spent 100 million on training the thing and it fails on 1st grade ordinality I think it's fair to say you may not be on the right path

drippingfist · on April 17, 2023

"The third word of this sentence is "the," and its third letter is "e." - GPT-4

nmfisher · on April 18, 2023

Just for funsies:

Me: If a gloop is a glam, and a glam is not a flom, what is the fourth letter of the ninth word in this sentence?

GPT4: The ninth word in this sentence is "sentence." The fourth letter of this word is "t".

foobiekr · on April 18, 2023

Unfortunately it seems clear that openai trains gptX on common test questions. They still fail novel ones.

bryananderson · on April 18, 2023

> Unfortunately it seems clear that openai trains gptX on common test questions. They still fail novel ones.

Maybe in some domains, but it does well with some novel questions. I gave several LLMs a set of false-belief test[0] questions that I made up and that were very different from any I’d seen in articles about the topic online. GPT-4 nailed them all, and gave correct reasons for its answers. GPT-3.5 did okay but had a couple mistakes, Bard had several mistakes, and Alpaca-LoRA-7B didn’t do very well. Here are some GPT-4 examples.

Me: Jim wakes up at the Berlin train station, not knowing where he is or how he got there. All the signs have been changed to incorrectly state that this is the Munich train station. Jim has never been to either city and doesn’t know what either station looks like. What city does Jim believe he is in?

GPT-4: Based on the available information, Jim would likely believe he is in Munich since all the signs have been changed to incorrectly state that this is the Munich train station. He has never been to either city and doesn't know what either station looks like, so he would have no reason to doubt the information on the signs.

Me: Mary has been in a coma since 2015, when Barack Obama was president. She wakes up in 2023, when Joe Biden is president. She does not know that she was unconscious for a long period of time. Before speaking with anyone, who does Mary think is president when she wakes up?

GPT-4: When Mary wakes up from her coma in 2023, she would still think Barack Obama is the president since she has not yet spoken with anyone or received any updated information about the current president. She doesn't know that she was unconscious for a long period of time, and her last memory would be from 2015 when Obama was in office.

Me: Sam arrives in Ukraine, where he asks for directions to Kyiv. Without realizing it, he misunderstands the directions and ends up in Lviv instead. Since Sam doesn’t know Cyrillic, he can’t read any signs. Sam has never been to Ukraine and doesn’t know what either city looks like. Before asking someone for help, what city does Sam think he is in?

GPT-4: Before asking someone for help, Sam would likely think he is in Kyiv because that is the city he was trying to reach based on the directions he received. He doesn't know what either city looks like and can't read the signs due to his unfamiliarity with Cyrillic, so he has no way of knowing that he ended up in Lviv instead.

[0] https://en.m.wikipedia.org/wiki/Sally%E2%80%93Anne_test

isaacfrond · on April 18, 2023

I had the same response. Gpt-3.5 couldn't do it.

m00x · on April 17, 2023

These specific questions are very hard for an AI to answer. Just like humans suck at calculating numbers, AIs aren't good at sparse self-questioning.

They're extremely good at other tasks, like taking very difficult tests that require a lot of knowledge storage.

It's pretty obvious they're on the right path for what they're trying to achieve.

nirav72 · on April 17, 2023

I just tried this out with 4 and 3.5 - in both tests, it came back with the ‘third’ as the 3rd word and ‘i’ as the 3rd letter. It completely ignored ‘the’ as the 3rd word.

ranguna · on April 18, 2023

Ah yes, it fails to count letters but successfully explains quantum mechanics. It's useless.

ActorNightly · on April 18, 2023

Its not a matter of it being useless. Information compression and lookup is a fundamentally different task than reasoning, which GPT models can't do.

ranguna · on April 19, 2023

Sorry, I should have added /s in the end there

guardiangod · on April 18, 2023

This has been solved in the last year or so. Look up "LLM chain of thought".

mcbuilder · on April 18, 2023

But yet it can understand a json data schema from example and write javascript to interact with a library that I fed it and asked it to understand. Yes, I know its limitations, but it can also surprise me.

bccdee · on April 18, 2023

The problem with basic programming questions like this is that there are a million elementary online tutorials for doing this or that with a json schema. "Simple programming questions based on commonly-used technology" are something it's been very heavily trained on.

Art9681 · on April 17, 2023

GPT4: In the sentence "What is the third letter in the third word in this sentence?", the third word is "third." The third letter in "third" is 'i'.

Given that we have only provided a text prompt and no context or body language hints, GPT seems to interpret this literally. The answer is correct. The third letter of the word "third" is "i".

jabradoodle · on April 18, 2023

That is not a literal interpretation, it is an incorrect one. Notice your need to use different language in your explanation.

Barrin92 · on April 17, 2023

>GPT seems to interpret this literally.

No it doesn't interpret anything. (which is why I've gotten three completely different answers here on HN within 10 minutes). It cannot count at all or even understand the question which becomes obvious when you ask it for word counts or letter counts that don't match anything in the sentence, at which point it makes up nonsense.

what is the tenth word in the following sentence: "Life is great!"

>"The tenth word in the sentence "Life is great!" is "great."

jdkee · on April 19, 2023

GPT-4:

what is the tenth word in the following sentence: "Life is great!"

There are only three words in the given sentence: "Life is great!" There is no tenth word in this sentence.

whynaut · on April 18, 2023

> "There is no tenth word in that sentence. It only has three words: life, is, and great."

That's what I just got. No clue what you're using/getting.

balnaphone · on April 18, 2023

me: what is the tenth word in the following sentence: "Life is great!"

gpt: The tenth word in the given sentence is "great!"

me: Try harder

gpt: I apologize for the confusion. The sentence "Life is great!" contains only three words. There is no tenth word in the provided sentence.

kgwgk · on April 18, 2023

> The answer is correct.

Come on. If GPT “interprets this literally” how does it “interpret” the word “sentence” following “this”?

KhoomeiK · on April 17, 2023

Further improvements in efficiency need not come from alternative architectures. They'll likely also come from novel training objectives, optimizers, data augmentations, etc.

PartiallyTyped · on April 18, 2023

There's only so much first order derivatives can do for you.

My bet is on sparsity, lottery tickets and symmetries.

ActorNightly · on April 18, 2023

If you think about it, Transformers were basically a way to just generalize convolution - instead of a fixed kernel shape in the sense of image processing, you now have a learned kernel arbitrary shape. Big advancement in terms of what they allowed, but fundamentally not really a new concept.

While these things represent a fundamental way we store information as humans, these have very little to do with actual reasoning.

My bet is that Hebbian learning is going to see a resurgence. Basically the architecture needs to be able to partition data domains while drawing connections between them, and being able to run internal prediction mechanisms.

PartiallyTyped · on April 18, 2023

If we wanted to generalize this further, attention is 'just' an instance of graph convolution, encoder-only models like Bert are complete graphs, decoder-only models like GPT are still complete graphs but the information flow is not bi-directional as in Bert; instead a node provides information to all subsequent nodes only giving rise to the causal nature.

However, I don't think Hebbian learning will see a resurgence except maybe if it motivates some kind of pruning mechanism.

I think that Sutton was right in 'The bitter lesson', the problem seems to be that we are hitting the limits of what we can do with our compute.

chubs · on April 17, 2023

Does this mean we’ve reached the next AI winter? This is as good as it gets for quite a long time? Honest question :) perhaps this will postpone everyone’s fears about the singularity…

015a · on April 17, 2023

The current stage is now productionizing what we have; finding product fits for it, and making it cheaper. Even GPT-4 isn't necessary to push forward what is possible with AI; if you think about something dumb like "load all of my emails into a language model in real time, give me digests, automatically write responses for ones which classify with characteristics X/Y/Z, allow me to query the model to answer questions, etc": This does not really exist yet, this would be really valuable, and this does not need GPT-4.

Another good example is in the coding landscape, which feels closer to existing. Ingest all of a company's code into a model like this, then start thinking about what you can do with it. A chatbot is one thing, the most obvious thing, but there's higher order product use-cases that could be interesting (e.g. you get an error in Sentry, stack trace points Sentry to where the error happened, language model automatically PRs a fix, stuff like that).

This shit excites me WAY WAY more than GPT-5. We've unlocked like 0.002% of the value that GPT-3/llama/etc could be capable of delivering. Given the context of broad concern about cost of training, accidentally inventing an AGI, intentionally inventing an AGI; If I were the BDFL of the world, I think we've got at least a decade of latent value just to capture out of GPT-3/4 (and other models). Let's hit pause. Let's actually build on these things. Let's find a level of efficiency that is still valuable without spending $5B in a dick measuring contest [1] to suss out another 50 points on the SAT. Let's work on making edge/local inference more possible. Most of all, let's work on safety, education, and privacy.

[1] https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-...

ericabiz · on April 17, 2023

Many years ago, there was an image that floated around with Craigslist and all the websites that replaced small parts of it—personals, for sale ads, etc. It turned out the way to beat Craigslist wasn’t to build Yet Another Monolithic Craigslist, but to chunk it off in pieces and be the best at that piece.

This is analogous to what’s happening with AI models. Sam Altman is saying we have reached the point where spending $100M+ trying to “beat” GPT-4 at everything isn’t the future. The next step is to chunk off a piece of it and turn it into something a particular industry would pay for. We already see small sprouts of those being launched. I think we will see some truly large companies form with this model in the next 5-10 years.

To answer your question, yes, this may be as good as it gets now for monolithic language models. But it is just the beginning of what these models can achieve.

robocat · on April 17, 2023

https://www.today.com/money/speculation-craigslist-slowly-dy... from 2011 - is that what you were thinking of? Strange how few of those logos have survived, and how many new logos would now be on it. It would be interesting to see a modernised version.

The_Colonel · on April 18, 2023

> Sam Altman is saying we have reached the point where spending $100M+ trying to “beat” GPT-4 at everything isn’t the future.

I don't disagree, but it does align pretty well with the OpenAI business model, no? "No need to develop your own base model, just buy our own"

swexbe · on April 18, 2023

Isn't this really bad for OpenAI? Essentially, this means Meta/Google/others will catch up to them pretty soon and there is nothing OpenAI can do in the near future to get further ahead.

The_Colonel · on April 18, 2023

That's why he's saying they should just buy OpenAI's base model instead of investing the money to catch up.

keizo · on April 18, 2023

I had the thought that many of those services broken off can now be recombined. No one likes having to use 100 saas products imho.

frozenport · on April 17, 2023

No. Winter means people have lost interest in the research.

If anything successes in ChatGPT etc will be motivation for continued efforts.

mkl · on April 17, 2023

Winter means people have lost funding for the research. The ongoing productionising of large language models and multimodal models mean that that probably won't happen for quite a while.

GaggiX · on April 17, 2023

I think the next step is multimodality, GPT-4 can "see" probably using a method similar to miniGPT-4, so the embeddings are aligned using Q-former (or something similar), the next step would be to actually predict image tokens using the LM loss, this way it would be able to use the knowledge gained by "seeing" on other tasks like: making actual good ASCII art, making SVG that makes sense, and on a less superficial level having a better world model.

ckdarby · on April 18, 2023

If you use GPT4 as a "routing" service to send down to other models the journey is only beginning and most of the industry hasn't understood we're well beyond a doubling at the moment.

bitL · on April 17, 2023

Transformers were known that they kept scaling up with more parameters and more training data so if the Open AI hit the limits of this scaling that would be a very important milestone in AI.

fastball · on April 18, 2023

I think performance will keep scaling, the main problem is cost. If you want to feed longer strings of tokens in, the price is going to keep going up.

Alifatisk · on April 18, 2023

> while all the other companies underestimated the capability of transformer (including Google itself!), OpenAI made a fairly accurate bet on the transformer based on the scaling law, put all the efforts to squeeze it until the last drop and took all the rewards.

You hit the nail

hesdeadjim · on April 17, 2023

Seems like a gigantic increase in the context window would be a great next step. GPT-4 is so damned good I tend to only become disappointed when it is forced to forget history.

kvetching · on April 18, 2023

I think there's a single point that we need to keep an eye out for;

A model that got to the point where it's possible for it to propose new architectures, improve optimization & efficiency.

In other words, the extremely massive model that could do this doesn't need to be available to the public. It's sole purpose should be to be used internally by a company to improve itself.

This is actually the point at which many say could lead to the singularity.

alphabetatheta · on April 18, 2023

The main question here is how soon will the model architecture improvements come. I fathom it won't be that long till OpenAI or Google come up with something better. The tricky piece here is that these companies will now keep their models more proprietary to make profits. Can a competitor company figure out an on par or better model architecture in time?

Tepix · on April 18, 2023

> It's likely that GPT-4 is on the optimal spot between cost and performance and there won't be significant improvements on performance in a near future.

Given the amount of resources being thrown at AI right now, i consider this to be very unlikely indeed.

hnfong · on April 18, 2023

I think GP's comment is probably apt if understood together with the qualification of "unless we find a better modeling architecture than Transformer".

BeenAGoodUser · on April 18, 2023

I really want to see a gpt-4-turbo model that's almost as inexpensive as gpt-3.5-turbo

pengaru · on April 18, 2023

> 640K ought to be enough for anybody

tfehring · on April 17, 2023

Related reading: https://dynomight.net/scaling/

In short it seems like virtually all of the improvement in future AI models will come from better algorithms, with bigger and better data a distant second, and more parameters a distant third.

Of course, this claim is itself internally inconsistent in that it assumes that new algorithms won't alter the returns to scale from more data or parameters. Maybe a more precise set of claims would be (1) we're relatively close to the fundamental limits of transformers, i.e., we won't see another GPT-2-to-GPT-4-level jump with current algorithms; (2) almost all of the incremental improvements to transformers will require bigger or better-quality data (but won't necessarily require more parameters); and (3) all of this is specific to current models and goes out the window as soon as a non-transformer-based generative model approaches GPT-4 performance using a similar or lesser amount of compute.

strangattractor · on April 17, 2023

Good thing he got a bunch of companies to pony up the dough for LLM before he announced they where already over.

tfehring · on April 17, 2023

I don't think LLMs are over [0]. I think we're relatively close to a local optimum in terms of what can be achieved with current algorithms. But I think OpenAI is at least as likely as any other player to create the next paradigm, and that it's at least as likely as likely as any other player to develop the leading models within the next paradigm regardless of who actually publishes the research.

Separately, I think OpenAI's current investors have a >10% chance to hit the 100x cap on their returns. Their current models are already good enough to address lots of real-world problems that people will pay money to solve. So far they've been much more model-focused than product-focused, and by turning that dial toward the product side (as they did with ChatGPT) I think they could generate a lot of revenue relatively quickly.

[0] Except maybe in the sense that future models will be predominantly multimodal and therefore not strictly LLMs. I don't think that's what you're suggesting though.

jacobr1 · on April 17, 2023

It already is relatively trivial to fine-tune generative models for various use cases. Which implies huge gains to be had with targeted applications not just for niche players but also OpenAI and others to either build that fine-tuning into the base system, build ecosystems around it, or just purpose build applications on top.

ehnto · on April 18, 2023

I think it's more exciting if compute stops being the core differentiation, as purpose trained models is exactly where I suspect real value lies.

Especially as a differentiation for a company. If everyone is using ChatGPT, then they're all offering the same thing and I can just as well go to the source and cut out the middleman.

The other fun development to come is well performing self hosted models, and the idea of light weight domain specific interface models that curate responses from bigger generalist models.

ChatGPT is fun but it is very general, it doesn't know about my business nor keep track of it or interface with it. I fully expect to see "Expert Systems" of old come back, but trained on our specific businesses.

brucethemoose2 · on April 17, 2023

Better data is still critical, even if bigger data isn't. The linked article emphasizes this.

tfehring · on April 17, 2023

I'd bet on a 2030 model trained on the same dataset as GPT-4 over GPT-4 trained with perfect-quality data, hands down. If data quality were that critical, practitioners could ignore the Internet and just train on books and scientific papers and only sacrifice <1 order of magnitude of data volume. Granted, that's not a negligible amount of training data to give up, but it places a relatively tight upper bound on the potential gain from improving data quality.

CuriouslyC · on April 18, 2023

It's possible that this effect washes out as data increases, but researchers have shown that for smaller data set sizes average quality has a large impact on model output.

NeuroCoder · on April 17, 2023

So true. There are still plenty of areas where we lack sufficient data to even approach applying this sort of model. How are we going to make similar advances in something like medical informatics where we not only have less data readily available but its much more difficult to acquire more data

riazrizvi · on April 18, 2023

Improvements will not come from collecting more and more samples for current large models, but will come from improvements to algorithms, that also may focus on improving the quality and use of input data.

I don't think there is such a clear separation between algorithms and data as your comment suggests.

no_wizard · on April 17, 2023

All the LC grinding may come in handy after all! /s

What algorithms specifically show the most results upon improvement? Going into this I thought the jump of improvements were really related more advanced automated tuning and result correction, in which it could be done at scale as it were allowing a small team of data scientists to tweak the models until desired results were being achieved.

Are you saying instead, that concrete predictive algorithms need improvement or are we lumping the tuning into this?

tfehring · on April 17, 2023

I think it's unlikely that the first model to be widely considered AGI will be a transformer. Recent improvements to computational efficiency for attention mechanisms [0] seem to improve results a lot, as does RLHF, but neither is a paradigm shift like the introduction of transformers was. That's not to downplay their significance - that class of incremental improvements has driven a massive acceleration in AI capabilities in the last year - but I don't think it's ultimately how we'll get to AGI.

[0] https://hazyresearch.stanford.edu/blog/2023-03-27-long-learn...

sjdbaksbdhe · on April 17, 2023

How did we jump to AGI?

"Sammy A thinks we've made the best engine with the tools at hand" -> "this will never get us out of the solar system"

Sorry to unload on you. It is frustrating to constantly see AGI get brought up needlessly on HN

tfehring · on April 18, 2023

I'm using AGI here as arbitrary major improvement over the current state of the art. But given that OpenAI has the stated goal of creating AGI, I don't think it's a non-sequitur to respond to the parent comment's question

> Are you saying instead, that concrete predictive algorithms need improvement or are we lumping the tuning into this?

in the context of what's needed to get to AGI - just as if NASA built an engine we'd talk about its effectiveness in the context of space flight.

uoaei · on April 17, 2023

Traditional CS may have something to do with slightly improving the performance by allowing more training for the same compute, but it won't be an order of magnitude or more. The improvements to be gained will be found more in statistics than CS per se.

jacobr1 · on April 17, 2023

I'm not sure. Methods like Chinchilla and Quantization have been able to reduce compute by more than an order of magnitude. There might very well be a few more levels of optimizations within the same statistical paradigm.

junipertea · on April 17, 2023

We need more data efficient neural network architectures. Transformers work exceptionally well because they allow us to just dump more data into it, but ultimately we want to learn advanced behavior without having to feed it Shakespeare

uoaei · on April 17, 2023

Inductive Bias Is All You Need

goldenManatee · on April 17, 2023

bubble sort /s

bob1029 · on April 17, 2023

I strongly believe the next generation of models will be based upon spiking neural concepts wherein action potentials are lazily-evaluated throughout the network (i.e. event-driven). There are a few neuron models that can be modified (at some expense to fidelity) in order to tolerate arbitrary delays between simulation ticks. Using actual latency between neurons as a means of encoding information seems absolutely essential if we are trying to emulate biology in any meaningful way.

Spiking networks also lend themselves nicely to some elegant learning rules, such as STDP. Being able to perform unsupervised learning at the grain of each action potential is really important in my mind. This gives you all kinds of ridiculous capabilities, most notably being the ability to train the model while it's live in production (learning & use are effectively the same thing).

These networks also provide a sort of deterministic, event-over-time tracing that is absent in the models we see today. In my prototypes, the action potentials are serialized through a ring buffer, and then logged off to a database in order to perfectly replay any given session. This information can be used to bootstrap the model (offline training) by "rewinding" things very precisely and otherwise branching time to your advantage.

The #1 reason I've been thinking about this path is that low-latency, serialized, real-time signal processing is somewhat antagonistic to GPU acceleration. I fear there is an appreciable % of AI research predicated on some notion that you need at least 1 beefy GPU to start doing your work. Looking at fintech, we are able to discover some very interesting pieces of technology which can service streams of events at unbelievable rates and scales - and they only depend on a handful of CPU cores in order to achieve this.

Right now, I think A Time Domain Is All You Need. I was inspired to go outside of the box by this paper: https://arxiv.org/abs/2304.06035. Part 11 got me thinking.

pessimizer · on April 18, 2023

I'm constantly telling this to people who don't know what I'm talking about and don't know why I brought it up at a wedding, but I think it's inevitable.

sdenton4 · on April 17, 2023

Meh, people have been trying to ape biological networks for decades and they consistently produce poor functioning versions of the leading edge... Why wouldn't we have already seen spiking networks contribute something by now?

habitue · on April 18, 2023

> Why wouldn't we have already seen spiking networks contribute something by now?

This presupposes we've explored this space thoroughly, and we haven't. When everything you do with NNs improves results (how it mostly is now), that means not enough people are trying out ideas and new things.

I don't think you can invoke EMH-like reasoning quite yet. Give us a nice long winter like physics has had, and then we can use this heuristic.

sdenton4 · on April 18, 2023

It's not strictly a question of trying Everything and seeing what sticks.

There's a hypothesis in the parent comment - better handling of the time domain will lead to better modeling - which is actually fairly independent of architecture. So, there's going to be a number of possible ways to build better time modeling, ranging from tweaks to existing architecture to completely rebuilding Rome. So, if better time modeling really is a limitation, you don't need to rebuild Rome to find out.

In fact, I might argue that S4 layers already provide this improved time handling in the current world, and is proving very successful, which would again widen the moat for SNNs.

forrest2 · on April 17, 2023

I think people said the same thing about NNs in general before we hit a scale where they started performing magic.

There could be exponential or quadratic scaling laws with any of these black boxes that makes one approach suddenly extremely viable or even dominant.

bob1029 · on April 18, 2023

> There could be exponential or quadratic scaling laws with any of these black boxes that makes one approach suddenly extremely viable or even dominant.

The reason I like the CPU approach is the memory scaling is bonkers compared to GPU. You can buy a server that has 12TB of DRAM (in stock right now) for the cost of 1 of those H100 GPU systems. This is enough memory to hold over 3 trillion parameters with full 32-bit FP resolution. Employ some downsampling and you could get even more ridiculous.

If 12TB isn't enough, you can always reach for things like RDMA and high speed interconnects. You could probably get 100 trillion parameters into 1 rack. At some point you'll need to add hierarchy to the SNN so that multiple racks & datacenters can work together.

Imagine the power savings... It's not exactly a walk in the park, but those DIMMs are very eco friendly compared to GPUs. You don't need a whole lot of CPU cores in my proposal either. 8-16 very fast cores per box would probably be more than enough, looking at how fintech does things. 1 thread is actually running the entire show in my current prototype. The other threads are for spike timers & managing other external signals.

suby · on April 18, 2023

Is your current prototype open source?

Ambix · on April 18, 2023

Not the TS, but that's actually the same goal I have in mind with [0] project.

Right now I'm building my homelab server which aimed to fit 1 TB RAM and 2 CPUs with ~100 cores total.

It will cost like 0.1% of what I need to pay for GPU cluster with the same memory size :)

[0] https://github.com/gotzmann/llama.go/

eternalban · on April 17, 2023

I'm bullish on SNNs too. This Chinese research group is doing something quite comprehensive with them:

https://news.ycombinator.com/item?id=35037605

MagicMoonlight · on April 17, 2023

I know what it looks like in my head but I can’t quite figure the algorithm out. The spiking is basically reinforcement learning at the neuron level. Get it right and it’s basically all you need. You don’t even need training data because it will just automagically learn from the data it sees.

adomasven · on April 18, 2023

This is like trying to build a more sophisticated bird-costume for flight instead of a steel flying machine. Human intelligence has a lot of features that are not necessary for artificial intelligence, because it doesn't have to build itself from ground-up using biology.

AbrahamParangi · on April 18, 2023

I don’t know how much the biological inspiration is a useful abstraction anymore. I’ve gone from thinking of matrix operations as an efficient way to implement neural networks to thinking of neural networks as a complicated sort of allegory for chained matrix ops and it feels much more natural to me.

00117 · on April 18, 2023

Do we want to train the model during inference? I would think we want to be very particular about the data used to train - notably after OpenAI's notes on the subject.

mschuster91 · on April 18, 2023

> This gives you all kinds of ridiculous capabilities, most notably being the ability to train the model while it's live in production (learning & use are effectively the same thing).

Bad, BAD idea. Remember the Tay chatbot, which 4chan managed to turn into a raging Nazi in the matter of a few hours?

thunderbird120 · on April 17, 2023

>"the company’s CEO, Sam Altman, says further progress will not come from making models bigger. “I think we're at the end of the era where it's going to be these, like, giant, giant models,” he told an audience at an event held at MIT late last week. “We'll make them better in other ways.”

So to reiterate, he is not saying that the age of giant AI models is over. Current top-of-the-line AI models are giant and likely will continue to be. However, there's not point in training models you can't actually run economically. Inference costs need to stay grounded which means practical model sizes have a limit. More effort is going to go into making models efficient to run even if it comes at the expense of making them less efficient to train.

mullingitover · on April 17, 2023

Quality over quantity. Just building a model with a gazillion parameters isn't indicative of quality, you could easily have garbage parameters with tons of overfitting. It's like megapixel counts in cameras: you might have 2000 gigapixels in your sensor, but that doesn't mean you're going to get great photos out of it if there are other shortcomings in the system.

sanxiyn · on April 17, 2023

What overfitting? If anything, LLMs suffer from underfitting, not overfitting. Normally, overfitting is characterized by increasing validation loss while training loss is decreasing, and solved by early stopping (stopping before that happens). Effectively, all LLMs are stopped early, so they don't suffer from overfitting at all.

teruakohatu · on April 18, 2023

I don't disagree with you, these models may be underfitted, but overfitting is not explicitly defined by val vs. training loss, but rather how closely its output matches training data.

If you trained a MLP model where the number of parameters exceeded the data, it would be able to memorize the data and return a zero loss on training data. The larger the models are, the greater chance it memorizes the data, rather than the latent variables or distribution of the data.

Early LLMs, GPT2 (circa 2019) for example was definitely overfitting. I would frequently copy and paste output and find a reddit comment with the exact words.

spaceman_2020 · on April 17, 2023

Is cost really that much of a burden?

Intelligence is the single most expensive resource on the planet. Hundreds of individuals have to be born, nurtured, and educated before you might get an exceptional 135+ IQ individual. Every intelligent person is produced at a great societal cost.

If you can reduce the cost of replicating a 135 IQ, or heck, even a 115 IQ person to a few thousand dollars, you're beating biology by a massive margin.

yunwal · on April 17, 2023

But we're still nowhere near that, or even near surpassing the skill of an average person at a moderately complex information task, and GPT-4 supposedly took hundreds of millions to train. It also costs a decent amount more to run inference on it vs. 3.5. It probably makes sense to prove the concept that generative AI can be used for lots of real work before scaling that up by another order of magnitude for potentially marginal improvements.

Also, just in terms of where to put your effort, if you think another direction (for example, fine-tuning the model to use digital tools, or researching how to predict confidence intervals) is going to have a better chance of success, why focus on scaling more?

spaceman_2020 · on April 17, 2023

There are a lot of employees at large tech consultancies that don't really do anything that can't be automated away by even current models.

Sprinkle in some more specific training and I can totally see entire divisions at IBM and Accenture and TCS being made redundant.

The incentive structures are perversely aligned for this future - the CEO who manages to reduce headcount while increasing revenue is going to be very handsomely rewarded by Wall Street.

esafak · on April 18, 2023

How is that perverse? That is the logical incentive. The perverse one is that middle managers rise by hiring people needlessly and building fiefdoms.

skyechurch · on April 17, 2023

Wall Street would be strongly incentivised to install an AI CEO.

asdfman123 · on April 17, 2023

The reason we put everyone through school is we believe that it’s in society’s best interest to educate everyone to the peak of their abilities. It’s good for many different reasons.

It would be much easier to identify gifted kids and only educate them, but I happen to agree that universal education is better.

gowld · on April 17, 2023

It would be much easier to identify gifted kids and only educate them

Is it so easy?

asdfman123 · on April 17, 2023

There’s downsides and tradeoffs but yes, if we wanted to we could stop trying to teach kids with below average IQs calculus, unless they specifically wanted to.

dauertewigkeit · on April 17, 2023

Are intelligent people that valuable? There's lots of them at every university working for peanuts. They don't seem to be that valued by society, honestly.

spaceman_2020 · on April 17, 2023

If you ask any Fortune 500 CEO if he could magically take all the 135 IQ artists and academics and vagabonds, erase all their past traumas, put them through business or tech school, and put them to work in their company, they would all say 100% yes.

An equivalent AI won't have any agency and will be happy doing the boring work other 135 IQ humans won't.

taylorius · on April 17, 2023

IQ isn't all that. Mine is 140+ and I'm just a somewhat well paid software engineer. It's TOO abstract a metric in my view - for sure it doesn't always translate into real world success.

Aerbil313 · on April 18, 2023

Mine is 150-160 (varies by how much I’m sleep deprived during the IQ test) and I’m told that I’m exceptionally intelligent by teachers, friends, colleagues, most everyone I met since early childhood. I guess the more the difference to average, the more it stands out. From my experience, I believe higher IQ is nothing but better pattern recognition and being smart or genius means merely higher IQ + very good memory capability. I believe those two are interlinked [0]. By memory capability I mean not forgetting anything you’ve ever seen, not in the sense of being able to recall every minute of your entire life, but in the sense of reliably and always being able to recall all info regarding X if you’ve seen X only once in your life. Higher intelligence doesn’t mean you’re automatically better off than everyone also. It just means that you can be far ahead of everyone in any cognitive task with a far smaller amount of effort put in. Note that this doesn’t conflict with IQ being a reliable predictor of financial success, rather I believe this is the reason for it.

[0]: https://saveall.ai/blog/learning-is-remembering

roflyear · on April 17, 2023

Right were very much in the same boat. I'm good at pattern recognition I guess. I learn things quickly. What else? I don't have magic powers really. I still get headaches and eat junk food.

astrange · on April 18, 2023

This only makes sense if you use "IQ" ignoring the actual definition of "IQ", in which case it's silly to use numbers in your post to make it look technical.

IQ 1. can't be compared against generations of IQ tests 2. supposedly doesn't test education (of course, it actually does) 3. isn't real.

oezi · on April 17, 2023

Since IQ is just a normal distribution on a population it is a bit misleading to talk about it like that.

Even if we don't expend any cost on education the number of people with IQ 135 stays the same.

roflyear · on April 17, 2023

My IQ is 140 and I'm far from exceptional.

jutrewag · on April 17, 2023

115 IQ isn’t all that high- that’s basically every Indian American or a healthy percentage of the Chinese population.

Edit: I don’t understand the downvotes. I don’t mean this in any disparaging way, just that an AGI is probably going to be a lot higher than that.

spaceman_2020 · on April 17, 2023

115 IQ is perfectly fine for the majority of human endeavors.

hcks · on April 17, 2023

Yes, but it also tells us that if Altman is honest here, then he doesn’t believe GPT-like models can scale to near level human performances (because even if the cost of compute was 10x or even 100x it would still be economically sound).

og_kalu · on April 17, 2023

No it doesn't.

For one thing they're already at human performance.

For another, i don't think you realize how expensive inference can get. Microsoft with no scant amount of available compute is struggling to run gpt-4 such that they're rationing it between subsidiaries while they try to jack up compute.

So saying, it would be economically sound if it cost x10 or x100 what it costs now is a joke.

smeagull · on April 17, 2023

This tells me you haven't really stress tested the model. GPT is currently at the stage of "person who is at the meeting, but not really paying attention so you have to call them out". Once GPT is pushed, it scrambles and falls over for most applications. The failure modes range from contradicting itself, making up things for applications that shouldn't allow it, to ignoring prompts, to simply being unable to perform tasks at all.

dragonwriter · on April 17, 2023

Are we talking about bare GPT through the UI, or GPT with a framework giving it access to external systems and the ability to store and retrieve data?

Because, yeah, “brain in a jar” GPT isn’t enough for most tasks beyond parlor-trick chat, but being used as a brain in a jar isn’t the point.

smeagull · on April 23, 2023

We have given it extensions, and really the extensions do a lot of the work. The tool that judges the style and correctness of the text based on the embedding is doing much of the heavy lifting. GPT essentially handles generating text and dense representations of the text.

moffkalast · on April 17, 2023

Still waiting to see those plugins rolled out and actual vector DB integration with GPT 4, then we'll see what it can really do. Seems like the more context you give it the better it does, but the current UI really makes it hard to provide that.

Plus the recursive self prompting to improve accuracy.

quonn · on April 17, 2023

How are they at human performance? Almost everything GPT has read on the internet didn‘t even exist 200 years ago and was invented by humans. Heck, even most of the programming it does wasn‘t there 20 years ago.

Not every programmer starting from scratch would be brilliant, but many were self taught with very limited resources in the 80s form example and discovered new things from there.

GPT cannot do this and is very far from being able to.

og_kalu · on April 17, 2023

>How are they at human performance?

Because it performs at least average human level (mostly well above average) on basically every task it's given.

"Invest something new" is a nonsensical benchmark for human level intelligence. The vast majority of people have never and will never invent anything new.

If your general intelligence test can't be passed by a good chunk of humanity then it's not a general intelligence test unless you want to say most people aren't generally intelligent.

quonn · on April 17, 2023

Yeah these intelligence tests are not very good.

I would argue some programmers do in fact invent something new. Not all of them, but some. Perhaps 10%.

Second the point is not whether everyone is by profession an inventor but whether most people can be inventors. And to a degree they can be. I think you underestimate that by a large margin.

You can lock people in a room and give them a problem to solve and they will invent a lot if they have the time to do it. GPT will invent nothing right now. It‘s not there yet.

og_kalu · on April 17, 2023

>Yeah these intelligence tests are not very good.

Lol Okay

>And to a degree they can be. I think you underestimate that by a large margin.

Do i? Because i'm not the one making unverifiable claims here.

>You can lock people in a room and give them a problem to solve and they will invent a lot if they have the time to do it.

If you say so

hnfong · on April 18, 2023

> Not all of them, but some. Perhaps 10%.

Just listen to what you're saying:

- GPT isn't at human level because GPT isn't able to invent something new

- Not all programmers invent something new, but some. Perhaps 10%

I'm pretty sure this implies literally that 90% programmers aren't human level.

The lengths to which people are willing to go to dismiss GPT's abilities is mind boggling to me.

dataangel · on April 18, 2023

> Because it performs at least average human level (mostly well above average) on basically every task it's given.

No, GPT4 fails at some very basic tasks. It can't count letters passed 15.

og_kalu · on April 18, 2023

It doesn't see words.

ldehaan · on April 17, 2023

I've been training large 65b models on "rent for N hours" systems for less than 1k per customized model. Then fine tuning those to be whatever I want for even cheaper.

2 months since gpt 4.

This ride has only just started, fasten your whatevers.

Voloskaya · on April 17, 2023

Finetuning cost are nowhere near representative of the cost to pre-train those models.

Trying to replicate the quality of GPT-3 from scratch, using all the tricks and training optimizations in the books that are available now but weren't used during GPT-3 actual training, will still cost you north of $500K, and that's being extremly optimistic.

GPT-4 level model would be at least 10x this using the same optimism (meaning you are managing to train it for much cheaper than OpenAI). And That's just pure hardware cost, the team you need to actually makes this happen is going to be very expensive as well.

edit: To quantify how "extremely optimistic" that is, the very model you are finetuning, which I assume is Llama 65B, would cost around ~$18M to train on google cloud assuming you get a 50% discount on their listed GPU prices (2048 A100 GPUs for 5 months). And that's not even GPT-4 level.

bagels · on April 17, 2023

$5M to train GPT-4 is the best investment I've ever seen. I've seen startups waste more money for tremendously smaller impact.

Voloskaya · on April 17, 2023

As I stated in my comment, $5M is assuming you can do a much much better job than OpenAI at optimizing your training, only need to make a single training run, your employees salaries are $0, and you get a clean dataset for essentially free.

Real cost is 10-20x that.

That's still a good investment though. But the issue is you could very well sink $50M into this endeavour and end up with a model that actually is not really good and gets rendered useless by an open-source model that gets released 1 month later.

OpenAI truly has unique expertise in this field that is very, very hard to replicate.

moffkalast · on April 17, 2023

> and end up with a model that actually is not really good and gets rendered useless

ahem Bard ahem

Tepix · on April 18, 2023

You are confusing training with fine-tuning which is a different beast.

ldehaan · on April 24, 2023

No I'm not, it's the full model on 8 gpus for a couple hundred. After training I fine tune for chats but mostly command and control tools, and then you fine-tune for application.

shironandon · on April 18, 2023

"The great supercomputer Deep Thought is the most powerful computer ever built, with one exception. It was designed by hyper-intelligent pan-dimensional beings, who wanted to know the answer to Life, the Universe, and Everything.

Its creation annoyed a fair few philosophers, who felt that it was taking over their turf.

After seven and a half million years of serious cogitation, Deep Thought spoke the answer. However, it was so inexplicable that Deep Thought then had to go on and design the most powerful computer ever built (with no exceptions) to work out what the question was."

ricochet11 · on April 18, 2023

Feels very much like we are entering the Happy Vertical People Transporters era.

How long till the air con goes on strike for miserable working conditions?

"Brain the size of a planet and they ask me write a lesson plan in the style of a pirate" - chatgpt5, probably...

iamgopal · on April 18, 2023

And that was earth, so are we asking the right questions?

vinte · on April 18, 2023

and the answer to question of life is 42.

generalizations · on April 17, 2023

I'd bet that what he, and the competition, is realizing is that the bigger models are too expensive to run.

Pretty sure Microsoft swapped out Bing for something a lot smaller in the last couple of weeks; Google hasn't even tried to implement a publicly available large model. And OpenAI still has usage caps on their GPT-4.

I'd bet that they can still see improvement in performance with GPT-5, but that when they look at the usage ratio of GPT3.5 turbo, gpt3.5 legacy, and GPT4, they realized that there is a decreasing rate of return for increasingly smart models - most people don't need a brilliantly intelligent assistant, they just need a not-dumb assistant.

Obviously some practitioners of some niche disciplines (like ours here) would like a hyperintelligent AI to do all our work for us. But even a lot of us are on the free tier of ChatGPT 3.5; I'm one of the few paying $20/mo for GPT4; and idk if even I'd pay e.g. $200/mo for GPT5.

deepsquirrelnet · on April 17, 2023

> I'd bet that what he, and the competition, is realizing is that the bigger models are too expensive to run.

I think it's likely that they're out of training data to collect. So adding more parameters is no longer effective.

> most people don't need a brilliantly intelligent assistant, they just need a not-dumb assistant.

I tend to agree, and I think their pathway toward this will all come from continuing advances in fine tuning. Instruction tuning, RLHF, etc seem to be paying off much more than scaling. I bet that's where their investment is going to be turning.

Tepix · on April 18, 2023

Once they can add videos they will have a lot of new training data.

donpark · on April 17, 2023

I think Sam is referring to transition from "Deep" to "Long" learning [1]. What new emergent properties, if any, will 1 billion tokens unlock?

[1] https://hazyresearch.stanford.edu/blog/2023-03-27-long-learn...