The point is that now we're at the point of diminishing return for increasing model size, unless we find a better modeling architecture than Transformer.
I think this is likely true; while all the other companies underestimated the capability of transformer (including Google itself!), OpenAI made a fairly accurate bet on the transformer based on the scaling law, put all the efforts to squeeze it until the last drop and took all the rewards.
It's likely that GPT-4 is on the optimal spot between cost and performance and there won't be significant improvements on performance in a near future. I guess the next task would be more on efficiency, which has a significant implication on its productionization.
Bet Google won’t make that mistake again, i.e. it won’t publish as much and will be much more careful about what it publishes, least they give a competitor a useful tool and get nothing in return - when the competitor (in this case very ironically named) goes full commercial and close source everything they can.
Open collaboration in AI, at least when it comes to corporations, might have come to an end.
The scale of the damage Open AI has done to the trust ecosystem with soliciting not just the work, but also massive fundraising and then privatizing the profits is almost unprecedented and permanent.
What trust ecosystem are you talking about ? It was a lack of foresight by google on their own discovery of transformers, and it would probably have been sitting in dust or been killed off by the time it would have taken them to reach GPT-2 level of progress.
The trust ecosystem of the AI community - most every breakthrough was published publicly until OpenAI decided to take advantage of that.
Besides that, this comment contained a ton of statements on what “would” have happened had Google not published. Interesting but worthless way to defend openAI’s actions.
Seems to me that it's very hard to have a moat in LLMs without proprietary code, given that most of their training data is freely available. This is very different form the 2010's era of AI, where they were trained on large amounts of proprietary data that was specific to a given service and could not be replicated.
It's a lot easier to pat yourself on the back for releasing a paper about your techniques when your competitors can't replicate your service with it. I think that as generative AI models move past the hype phase into the competitive phase, they will be keeping a lot of innovation proprietary for at least a few years to maintain an edge over their competitors.
Let's just hope they don't move to patenting everything.
The point of OpenAI was that no single company would have a moat around LLM or foundation models in general. It was set as a non profit with this goal in mind and got money for it.
Whether Google pat itself on the back or not for releasing the paper no one could replicate, is not important, because an open research had never been their company’s goal. What happened is a for-profit company released a paper, that allowed a huge advantage to a company, whose mission was to ensure that no one has a huge advantage in the field. OpenAI was converted to for profit and established exclusive relationship with Microsoft.
Google fucked up and missed the train, but they can catch up. Much harder for smaller companies if as a result of this Fb, Google, etc AI research dept lock down their papers to tools to internal use only.
Doesn't Google have any patents on the transformer architecture? I assume large enterprises tend to patent everything that comes out of their research.
GPT models are based on transformer, but architecture is different from what's patented.
Not a lawyer, but can you really patent certain network architecture? Theoretically someone could invent new activation function that just happens to make same architecture perform a lot better on some tasks, can you call really cover that with patent?
I suspect you are right. We may be stuck at the gpt4 sizes for a bit just because of hardware costs though. As they get bigger it costs too much to run them until our hardware becomes more optimal for these large models at 4 bits or so.
I think the YouTube videos is going to be the next big training set. A transformer trained on all text and all of YouTube will be killer amazing at so much. I bet it can understand locomotion and balance and body control from YouTube.
I wonder if TPUs, like Google's Tensor chip, will beat out GPUs when it comes to image/video based training?
NewPipe/Freetube/Invidious instances + SponsorBlock API support is really great at filtering all of these useless advertising memes. Somehow ($$$) the behavior of TV adverts culture seeped into YouTube. I've not watched broadcast/cable TV in years, but when I interact with people that do, inevitably, they make similar references. It's super weird.
Yeah, SponsorBlock (https://sponsor.ajay.app/) is crowdsourced data for the in-video ads. Weirdly enough, I've never contributed timestamps, but the vast majority of the content I watch has submitted timestamps. I support a bunch of creators on patreon, but very much disagree with their double dipping.
> I wonder if TPUs, like Google's Tensor chip, will beat out GPUs when it comes to image/video based training?
One of the OpenAI guys was talking about this. He said the specific technology does not matter, it is just a cost line item. They don't need to have the best chip tech available as long as they have enough money.
That said I am curious if anyone else can really comment on this. It seems like as we get to very large and expensive models we will produce more and more specialized technology.
Whether or not cost matters much depends on your perspective.
If you’re OpenAI and GPT4 is just a step on the way to AGI, and you can amortize that huge cost over the hundreds of millions in revenue you’re gonna pull in from subscriptions and API use… then sure you’re probably not very cost sensitive. It could be 20% cheaper or 50% more expensive, whatever, it’s so good your customers will use it at a wide range of costs. And you have truckloads of money from Microsoft anyways.
If you’re a company or a developer trying to build a feature, whole new product, or an entire company on top of GPT then that cost matters a whole lot. The difference between $0.06 and $0.006 per turn could be infeasible vs. shippable.
If you’re trying to compete with OpenAI then you’re probably doing everything possible to reduce that training cost.
So, whether or not it matters - it really depends.
> They don't need to have the best chip tech available as long as they have enough money.
That sounds like someone who is "Blitzscaling." Costs do not matter in those cases, just acquiring customers and marketshare. But for the rest of us, who will see benefits but are not trying to win a $100B market, we will cost optimize.
Maybe it's just a line item to them, but it's pretty relevant to anyone operating with a less-than-gargantuan budget. If a superior/affordable chip is widely available, OpenAI's competitive advantage recedes rapidly because suddenly everyone else can do what they can. To some extent that's exactly what happened with DALL-E/StableDiffusion.
assuming it's not horizontally scalable, because otherwise they would just out-spend everyone else anyway like they've already done. That's a big "if", though.
or like a curve of model complexity versus results or whatever showing it asymptotically approaches whatever.
actually there was a great paper from microsoft research from like 2001 on spam filtering where they demonstrated that model complexity necessary for spam filtering went down as the size of the data set went up. That paper, which i can't seem to find now, had a big impact on me as a researcher because it so clearly demonstrated that small data is usually bad data and sophisticated models are sometimes solving problems will small data sets instead of problems with data.
of course this paper came out the year friedman published his gradient boosting paper, i think random forest also was only recently published then as well (i think there is a paper from 1996 about RF and briemans two cultures paper came out this year where he discusses RF i believe), and this is a decade before gpu based neural networks. So times are different now. But actually i think the big difference is these days i probably ask chatgpt to write the boiler plate code for a gradient boosted model that takes data out of a relational database instead of writing it myself.
> model complexity necessary for spam filtering went down as the size of the data set went up
My naive conclusion in that this means there are still massive gains to be had, since, for example, something like ChatGPT is just text, and the phrase "a picture is worth a thousand words" seems incredibly accurate, from my perspective. There's an incredible amount of non-text data out there still. Especially technical data.
and it might be opposite for the GPT models actually. it's just easier for humans to grasp the bunch of knowledge with one eyes sight, but usually most of useful information might be represented with just of bunch of words and machines are to scan through the millions of words in an instant.
Schematics (of any sort), block diagrams, general spatial awareness (including anything related to puzzle pieces/packing, like circuit layout), most physics problems involving force diagrams, anything mechanical, etc. The text representation of any of these is ludicrously more complex than simple images.
If you sit someone down, that works in one of these fields, you'll quickly see the limitations. It'll try to represent the concepts as text, with ascii art or some "attempt" at an ascii file format that can be used to draw, and its "reasoning" about these things is much more limited.
I think most people interacting with GPT are in a text-only (and especially programming) bubble.
Better on which axis? Do you want an AI that takes one hour to respond to? Some would for certain fields, but getting something fast and cheap is going to be hard now that Moore’s law is over.
Don't we all agree that GPT4 is "better" than GPT3? How are we evaluating that if the axis is such a mystery. Yeah maybe we can't quantify it like I can't tell you one writer is better than another in a quantitative but we can both still read their work and come to an understanding.
Exponential scaling for a presumable GPT-5 suggests it's response time will be unusably long for the vast majority of use cases, and probably cost multiple dollars USD per query.
Not to mention there doesn't actually exist enough English text data in the world to even double GPT-4's training set.
Compute will also scale exponentially in coming years. The data source limitation seems to be a harder barrier, I think many companies are experimenting with AI generated content for training at this point.
> Compute will also scale exponentially in coming years.
Cost per transistor scaling has already plateaued or perhaps even inverted with TSMC's latest and greatest.
And the new chips, even after 25 layers of EUV lithography, more than doubling the previous record, and an extra year of fine tuning, has total SRAM size scaling of -5% and logic scaling of -42%.
These are numbers verified by experienced semi people.
Perhaps (a) there do exist bigger models that weren't better or (b) this model isn't better than somewhat smaller ones. Perhaps the CEO has seen diminishing returns.
Well, "no GPT-5" isn't the same as saying "no new trained model", especially in the realm of marketing. Welcome to "GPT 2024" could be his next slogan.
That is one AI CEO out of 10,000. Just because OpenAI may not be interested in a larger model in the short term doesn't mean nobody else won't pursue it.
Others might pursue a smaller model that works as well as a larger model. If that can be done, whoever does it can very effectively compete on price/performance.
It seems that to assume otherwise (the only way to improve is to get bigger) is to assume that OpenAI already has found the optimal architecture. That seems unlikely.
No. I don't always assume that just because a CEO makes a public statement they have internal data backing up the claim. Sometimes they do! Other times, they have data but are misinterpreting it or missing something, but it's impossible to tell if the data is just internal. Other times they're making a statement without data based on their personal beliefs. Other times, they don't even think the statement is true but are saying it for messaging, marketing, or communication reasons!
Like the previous commenter, I'd be much more confident an asymptote was reached if it was being demonstrated publicly.
Only OpenAI and its CEO know the full details on GPT-4's sizes so that's entirely possible. But since it's an internal secret, there's nothing compelling him to tell the truth. For all we know, he has internal data backing up the opposite of the claim but is making this claim so as to discourage potential competitors from spending the money training an even bigger and competitive ML model. Sending potential competitors off on a wide goose chase that, when pushed, he can just say "oh our internal data (that no one outside of a trusted few have seen) said otherwise". I have no idea if sama is such a person, but you must admit that the possibility exists.
You'd hope so, but unless people put their evidence in public, it could simply be a tool to manipulate the public's expectations or competitors behavior.
I'll get downvoted for this, apples previous CEO was consistently inaccurate about company innovation and performance numbers.
> Unfortunately it seems clear that openai trains gptX on common test questions. They still fail novel ones.
Maybe in some domains, but it does well with some novel questions. I gave several LLMs a set of false-belief test[0] questions that I made up and that were very different from any I’d seen in articles about the topic online. GPT-4 nailed them all, and gave correct reasons for its answers. GPT-3.5 did okay but had a couple mistakes, Bard had several mistakes, and Alpaca-LoRA-7B didn’t do very well. Here are some GPT-4 examples.
Me: Jim wakes up at the Berlin train station, not knowing where he is or how he got there. All the signs have been changed to incorrectly state that this is the Munich train station. Jim has never been to either city and doesn’t know what either station looks like. What city does Jim believe he is in?
GPT-4: Based on the available information, Jim would likely believe he is in Munich since all the signs have been changed to incorrectly state that this is the Munich train station. He has never been to either city and doesn't know what either station looks like, so he would have no reason to doubt the information on the signs.
Me: Mary has been in a coma since 2015, when Barack Obama was president. She wakes up in 2023, when Joe Biden is president. She does not know that she was unconscious for a long period of time. Before speaking with anyone, who does Mary think is president when she wakes up?
GPT-4: When Mary wakes up from her coma in 2023, she would still think Barack Obama is the president since she has not yet spoken with anyone or received any updated information about the current president. She doesn't know that she was unconscious for a long period of time, and her last memory would be from 2015 when Obama was in office.
Me: Sam arrives in Ukraine, where he asks for directions to Kyiv. Without realizing it, he misunderstands the directions and ends up in Lviv instead. Since Sam doesn’t know Cyrillic, he can’t read any signs. Sam has never been to Ukraine and doesn’t know what either city looks like. Before asking someone for help, what city does Sam think he is in?
GPT-4: Before asking someone for help, Sam would likely think he is in Kyiv because that is the city he was trying to reach based on the directions he received. He doesn't know what either city looks like and can't read the signs due to his unfamiliarity with Cyrillic, so he has no way of knowing that he ended up in Lviv instead.
I just tried this out with 4 and 3.5 - in both tests, it came back with the ‘third’ as the 3rd word and ‘i’ as the 3rd letter. It completely ignored ‘the’ as the 3rd word.
But yet it can understand a json data schema from example and write javascript to interact with a library that I fed it and asked it to understand. Yes, I know its limitations, but it can also surprise me.
The problem with basic programming questions like this is that there are a million elementary online tutorials for doing this or that with a json schema. "Simple programming questions based on commonly-used technology" are something it's been very heavily trained on.
GPT4: In the sentence "What is the third letter in the third word in this sentence?", the third word is "third." The third letter in "third" is 'i'.
Given that we have only provided a text prompt and no context or body language hints, GPT seems to interpret this literally. The answer is correct. The third letter of the word "third" is "i".
No it doesn't interpret anything. (which is why I've gotten three completely different answers here on HN within 10 minutes). It cannot count at all or even understand the question which becomes obvious when you ask it for word counts or letter counts that don't match anything in the sentence, at which point it makes up nonsense.
what is the tenth word in the following sentence: "Life is great!"
>"The tenth word in the sentence "Life is great!" is "great."
Further improvements in efficiency need not come from alternative architectures. They'll likely also come from novel training objectives, optimizers, data augmentations, etc.
If you think about it, Transformers were basically a way to just generalize convolution - instead of a fixed kernel shape in the sense of image processing, you now have a learned kernel arbitrary shape. Big advancement in terms of what they allowed, but fundamentally not really a new concept.
While these things represent a fundamental way we store information as humans, these have very little to do with actual reasoning.
My bet is that Hebbian learning is going to see a resurgence. Basically the architecture needs to be able to partition data domains while drawing connections between them, and being able to run internal prediction mechanisms.
If we wanted to generalize this further, attention is 'just' an instance of graph convolution, encoder-only models like Bert are complete graphs, decoder-only models like GPT are still complete graphs but the information flow is not bi-directional as in Bert; instead a node provides information to all subsequent nodes only giving rise to the causal nature.
However, I don't think Hebbian learning will see a resurgence except maybe if it motivates some kind of pruning mechanism.
I think that Sutton was right in 'The bitter lesson', the problem seems to be that we are hitting the limits of what we can do with our compute.
Does this mean we’ve reached the next AI winter? This is as good as it gets for quite a long time? Honest question :) perhaps this will postpone everyone’s fears about the singularity…
The current stage is now productionizing what we have; finding product fits for it, and making it cheaper. Even GPT-4 isn't necessary to push forward what is possible with AI; if you think about something dumb like "load all of my emails into a language model in real time, give me digests, automatically write responses for ones which classify with characteristics X/Y/Z, allow me to query the model to answer questions, etc": This does not really exist yet, this would be really valuable, and this does not need GPT-4.
Another good example is in the coding landscape, which feels closer to existing. Ingest all of a company's code into a model like this, then start thinking about what you can do with it. A chatbot is one thing, the most obvious thing, but there's higher order product use-cases that could be interesting (e.g. you get an error in Sentry, stack trace points Sentry to where the error happened, language model automatically PRs a fix, stuff like that).
This shit excites me WAY WAY more than GPT-5. We've unlocked like 0.002% of the value that GPT-3/llama/etc could be capable of delivering. Given the context of broad concern about cost of training, accidentally inventing an AGI, intentionally inventing an AGI; If I were the BDFL of the world, I think we've got at least a decade of latent value just to capture out of GPT-3/4 (and other models). Let's hit pause. Let's actually build on these things. Let's find a level of efficiency that is still valuable without spending $5B in a dick measuring contest [1] to suss out another 50 points on the SAT. Let's work on making edge/local inference more possible. Most of all, let's work on safety, education, and privacy.
Many years ago, there was an image that floated around with Craigslist and all the websites that replaced small parts of it—personals, for sale ads, etc. It turned out the way to beat Craigslist wasn’t to build Yet Another Monolithic Craigslist, but to chunk it off in pieces and be the best at that piece.
This is analogous to what’s happening with AI models. Sam Altman is saying we have reached the point where spending $100M+ trying to “beat” GPT-4 at everything isn’t the future. The next step is to chunk off a piece of it and turn it into something a particular industry would pay for. We already see small sprouts of those being launched. I think we will see some truly large companies form with this model in the next 5-10 years.
To answer your question, yes, this may be as good as it gets now for monolithic language models. But it is just the beginning of what these models can achieve.
https://www.today.com/money/speculation-craigslist-slowly-dy... from 2011 - is that what you were thinking of? Strange how few of those logos have survived, and how many new logos would now be on it. It would be interesting to see a modernised version.
Isn't this really bad for OpenAI? Essentially, this means Meta/Google/others will catch up to them pretty soon and there is nothing OpenAI can do in the near future to get further ahead.
Winter means people have lost funding for the research. The ongoing productionising of large language models and multimodal models mean that that probably won't happen for quite a while.
I think the next step is multimodality, GPT-4 can "see" probably using a method similar to miniGPT-4, so the embeddings are aligned using Q-former (or something similar), the next step would be to actually predict image tokens using the LM loss, this way it would be able to use the knowledge gained by "seeing" on other tasks like: making actual good ASCII art, making SVG that makes sense, and on a less superficial level having a better world model.
If you use GPT4 as a "routing" service to send down to other models the journey is only beginning and most of the industry hasn't understood we're well beyond a doubling at the moment.
Transformers were known that they kept scaling up with more parameters and more training data so if the Open AI hit the limits of this scaling that would be a very important milestone in AI.
> while all the other companies underestimated the capability of transformer (including Google itself!), OpenAI made a fairly accurate bet on the transformer based on the scaling law, put all the efforts to squeeze it until the last drop and took all the rewards.
Seems like a gigantic increase in the context window would be a great next step. GPT-4 is so damned good I tend to only become disappointed when it is forced to forget history.
I think there's a single point that we need to keep an eye out for;
A model that got to the point where it's possible for it to propose new architectures, improve optimization & efficiency.
In other words, the extremely massive model that could do this doesn't need to be available to the public. It's sole purpose should be to be used internally by a company to improve itself.
This is actually the point at which many say could lead to the singularity.
The main question here is how soon will the model architecture improvements come. I fathom it won't be that long till OpenAI or Google come up with something better. The tricky piece here is that these companies will now keep their models more proprietary to make profits. Can a competitor company figure out an on par or better model architecture in time?
> It's likely that GPT-4 is on the optimal spot between cost and performance and there won't be significant improvements on performance in a near future.
Given the amount of resources being thrown at AI right now, i consider this to be very unlikely indeed.
I think GP's comment is probably apt if understood together with the qualification of "unless we find a better modeling architecture than Transformer".
In short it seems like virtually all of the improvement in future AI models will come from better algorithms, with bigger and better data a distant second, and more parameters a distant third.
Of course, this claim is itself internally inconsistent in that it assumes that new algorithms won't alter the returns to scale from more data or parameters. Maybe a more precise set of claims would be (1) we're relatively close to the fundamental limits of transformers, i.e., we won't see another GPT-2-to-GPT-4-level jump with current algorithms; (2) almost all of the incremental improvements to transformers will require bigger or better-quality data (but won't necessarily require more parameters); and (3) all of this is specific to current models and goes out the window as soon as a non-transformer-based generative model approaches GPT-4 performance using a similar or lesser amount of compute.
I don't think LLMs are over [0]. I think we're relatively close to a local optimum in terms of what can be achieved with current algorithms. But I think OpenAI is at least as likely as any other player to create the next paradigm, and that it's at least as likely as likely as any other player to develop the leading models within the next paradigm regardless of who actually publishes the research.
Separately, I think OpenAI's current investors have a >10% chance to hit the 100x cap on their returns. Their current models are already good enough to address lots of real-world problems that people will pay money to solve. So far they've been much more model-focused than product-focused, and by turning that dial toward the product side (as they did with ChatGPT) I think they could generate a lot of revenue relatively quickly.
[0] Except maybe in the sense that future models will be predominantly multimodal and therefore not strictly LLMs. I don't think that's what you're suggesting though.
It already is relatively trivial to fine-tune generative models for various use cases. Which implies huge gains to be had with targeted applications not just for niche players but also OpenAI and others to either build that fine-tuning into the base system, build ecosystems around it, or just purpose build applications on top.
I think it's more exciting if compute stops being the core differentiation, as purpose trained models is exactly where I suspect real value lies.
Especially as a differentiation for a company. If everyone is using ChatGPT, then they're all offering the same thing and I can just as well go to the source and cut out the middleman.
The other fun development to come is well performing self hosted models, and the idea of light weight domain specific interface models that curate responses from bigger generalist models.
ChatGPT is fun but it is very general, it doesn't know about my business nor keep track of it or interface with it. I fully expect to see "Expert Systems" of old come back, but trained on our specific businesses.
I'd bet on a 2030 model trained on the same dataset as GPT-4 over GPT-4 trained with perfect-quality data, hands down. If data quality were that critical, practitioners could ignore the Internet and just train on books and scientific papers and only sacrifice <1 order of magnitude of data volume. Granted, that's not a negligible amount of training data to give up, but it places a relatively tight upper bound on the potential gain from improving data quality.
It's possible that this effect washes out as data increases, but researchers have shown that for smaller data set sizes average quality has a large impact on model output.
So true. There are still plenty of areas where we lack sufficient data to even approach applying this sort of model. How are we going to make similar advances in something like medical informatics where we not only have less data readily available but its much more difficult to acquire more data
Improvements will not come from collecting more and more samples for current large models, but will come from improvements to algorithms, that also may focus on improving the quality and use of input data.
I don't think there is such a clear separation between algorithms and data as your comment suggests.
All the LC grinding may come in handy after all! /s
What algorithms specifically show the most results upon improvement? Going into this I thought the jump of improvements were really related more advanced automated tuning and result correction, in which it could be done at scale as it were allowing a small team of data scientists to tweak the models until desired results were being achieved.
Are you saying instead, that concrete predictive algorithms need improvement or are we lumping the tuning into this?
I think it's unlikely that the first model to be widely considered AGI will be a transformer. Recent improvements to computational efficiency for attention mechanisms [0] seem to improve results a lot, as does RLHF, but neither is a paradigm shift like the introduction of transformers was. That's not to downplay their significance - that class of incremental improvements has driven a massive acceleration in AI capabilities in the last year - but I don't think it's ultimately how we'll get to AGI.
I'm using AGI here as arbitrary major improvement over the current state of the art. But given that OpenAI has the stated goal of creating AGI, I don't think it's a non-sequitur to respond to the parent comment's question
> Are you saying instead, that concrete predictive algorithms need improvement or are we lumping the tuning into this?
in the context of what's needed to get to AGI - just as if NASA built an engine we'd talk about its effectiveness in the context of space flight.
Traditional CS may have something to do with slightly improving the performance by allowing more training for the same compute, but it won't be an order of magnitude or more. The improvements to be gained will be found more in statistics than CS per se.
I'm not sure. Methods like Chinchilla and Quantization have been able to reduce compute by more than an order of magnitude. There might very well be a few more levels of optimizations within the same statistical paradigm.
We need more data efficient neural network architectures. Transformers work exceptionally well because they allow us to just dump more data into it, but ultimately we want to learn advanced behavior without having to feed it Shakespeare
I strongly believe the next generation of models will be based upon spiking neural concepts wherein action potentials are lazily-evaluated throughout the network (i.e. event-driven). There are a few neuron models that can be modified (at some expense to fidelity) in order to tolerate arbitrary delays between simulation ticks. Using actual latency between neurons as a means of encoding information seems absolutely essential if we are trying to emulate biology in any meaningful way.
Spiking networks also lend themselves nicely to some elegant learning rules, such as STDP. Being able to perform unsupervised learning at the grain of each action potential is really important in my mind. This gives you all kinds of ridiculous capabilities, most notably being the ability to train the model while it's live in production (learning & use are effectively the same thing).
These networks also provide a sort of deterministic, event-over-time tracing that is absent in the models we see today. In my prototypes, the action potentials are serialized through a ring buffer, and then logged off to a database in order to perfectly replay any given session. This information can be used to bootstrap the model (offline training) by "rewinding" things very precisely and otherwise branching time to your advantage.
The #1 reason I've been thinking about this path is that low-latency, serialized, real-time signal processing is somewhat antagonistic to GPU acceleration. I fear there is an appreciable % of AI research predicated on some notion that you need at least 1 beefy GPU to start doing your work. Looking at fintech, we are able to discover some very interesting pieces of technology which can service streams of events at unbelievable rates and scales - and they only depend on a handful of CPU cores in order to achieve this.
Right now, I think A Time Domain Is All You Need. I was inspired to go outside of the box by this paper: https://arxiv.org/abs/2304.06035. Part 11 got me thinking.
I'm constantly telling this to people who don't know what I'm talking about and don't know why I brought it up at a wedding, but I think it's inevitable.
Meh, people have been trying to ape biological networks for decades and they consistently produce poor functioning versions of the leading edge... Why wouldn't we have already seen spiking networks contribute something by now?
> Why wouldn't we have already seen spiking networks contribute something by now?
This presupposes we've explored this space thoroughly, and we haven't. When everything you do with NNs improves results (how it mostly is now), that means not enough people are trying out ideas and new things.
I don't think you can invoke EMH-like reasoning quite yet. Give us a nice long winter like physics has had, and then we can use this heuristic.
It's not strictly a question of trying Everything and seeing what sticks.
There's a hypothesis in the parent comment - better handling of the time domain will lead to better modeling - which is actually fairly independent of architecture. So, there's going to be a number of possible ways to build better time modeling, ranging from tweaks to existing architecture to completely rebuilding Rome. So, if better time modeling really is a limitation, you don't need to rebuild Rome to find out.
In fact, I might argue that S4 layers already provide this improved time handling in the current world, and is proving very successful, which would again widen the moat for SNNs.
> There could be exponential or quadratic scaling laws with any of these black boxes that makes one approach suddenly extremely viable or even dominant.
The reason I like the CPU approach is the memory scaling is bonkers compared to GPU. You can buy a server that has 12TB of DRAM (in stock right now) for the cost of 1 of those H100 GPU systems. This is enough memory to hold over 3 trillion parameters with full 32-bit FP resolution. Employ some downsampling and you could get even more ridiculous.
If 12TB isn't enough, you can always reach for things like RDMA and high speed interconnects. You could probably get 100 trillion parameters into 1 rack. At some point you'll need to add hierarchy to the SNN so that multiple racks & datacenters can work together.
Imagine the power savings... It's not exactly a walk in the park, but those DIMMs are very eco friendly compared to GPUs. You don't need a whole lot of CPU cores in my proposal either. 8-16 very fast cores per box would probably be more than enough, looking at how fintech does things. 1 thread is actually running the entire show in my current prototype. The other threads are for spike timers & managing other external signals.
I know what it looks like in my head but I can’t quite figure the algorithm out. The spiking is basically reinforcement learning at the neuron level. Get it right and it’s basically all you need. You don’t even need training data because it will just automagically learn from the data it sees.
This is like trying to build a more sophisticated bird-costume for flight instead of a steel flying machine. Human intelligence has a lot of features that are not necessary for artificial intelligence, because it doesn't have to build itself from ground-up using biology.
I don’t know how much the biological inspiration is a useful abstraction anymore. I’ve gone from thinking of matrix operations as an efficient way to implement neural networks to thinking of neural networks as a complicated sort of allegory for chained matrix ops and it feels much more natural to me.
Do we want to train the model during inference? I would think we want to be very particular about the data used to train - notably after OpenAI's notes on the subject.
> This gives you all kinds of ridiculous capabilities, most notably being the ability to train the model while it's live in production (learning & use are effectively the same thing).
Bad, BAD idea. Remember the Tay chatbot, which 4chan managed to turn into a raging Nazi in the matter of a few hours?
>"the company’s CEO, Sam Altman, says further progress will not come from making models bigger. “I think we're at the end of the era where it's going to be these, like, giant, giant models,” he told an audience at an event held at MIT late last week. “We'll make them better in other ways.”
So to reiterate, he is not saying that the age of giant AI models is over. Current top-of-the-line AI models are giant and likely will continue to be. However, there's not point in training models you can't actually run economically. Inference costs need to stay grounded which means practical model sizes have a limit. More effort is going to go into making models efficient to run even if it comes at the expense of making them less efficient to train.
Quality over quantity. Just building a model with a gazillion parameters isn't indicative of quality, you could easily have garbage parameters with tons of overfitting. It's like megapixel counts in cameras: you might have 2000 gigapixels in your sensor, but that doesn't mean you're going to get great photos out of it if there are other shortcomings in the system.
What overfitting? If anything, LLMs suffer from underfitting, not overfitting. Normally, overfitting is characterized by increasing validation loss while training loss is decreasing, and solved by early stopping (stopping before that happens). Effectively, all LLMs are stopped early, so they don't suffer from overfitting at all.
I don't disagree with you, these models may be underfitted, but overfitting is not explicitly defined by val vs. training loss, but rather how closely its output matches training data.
If you trained a MLP model where the number of parameters exceeded the data, it would be able to memorize the data and return a zero loss on training data. The larger the models are, the greater chance it memorizes the data, rather than the latent variables or distribution of the data.
Early LLMs, GPT2 (circa 2019) for example was definitely overfitting. I would frequently copy and paste output and find a reddit comment with the exact words.
Intelligence is the single most expensive resource on the planet. Hundreds of individuals have to be born, nurtured, and educated before you might get an exceptional 135+ IQ individual. Every intelligent person is produced at a great societal cost.
If you can reduce the cost of replicating a 135 IQ, or heck, even a 115 IQ person to a few thousand dollars, you're beating biology by a massive margin.
But we're still nowhere near that, or even near surpassing the skill of an average person at a moderately complex information task, and GPT-4 supposedly took hundreds of millions to train. It also costs a decent amount more to run inference on it vs. 3.5. It probably makes sense to prove the concept that generative AI can be used for lots of real work before scaling that up by another order of magnitude for potentially marginal improvements.
Also, just in terms of where to put your effort, if you think another direction (for example, fine-tuning the model to use digital tools, or researching how to predict confidence intervals) is going to have a better chance of success, why focus on scaling more?
There are a lot of employees at large tech consultancies that don't really do anything that can't be automated away by even current models.
Sprinkle in some more specific training and I can totally see entire divisions at IBM and Accenture and TCS being made redundant.
The incentive structures are perversely aligned for this future - the CEO who manages to reduce headcount while increasing revenue is going to be very handsomely rewarded by Wall Street.
The reason we put everyone through school is we believe that it’s in society’s best interest to educate everyone to the peak of their abilities. It’s good for many different reasons.
It would be much easier to identify gifted kids and only educate them, but I happen to agree that universal education is better.
There’s downsides and tradeoffs but yes, if we wanted to we could stop trying to teach kids with below average IQs calculus, unless they specifically wanted to.
Are intelligent people that valuable? There's lots of them at every university working for peanuts. They don't seem to be that valued by society, honestly.
If you ask any Fortune 500 CEO if he could magically take all the 135 IQ artists and academics and vagabonds, erase all their past traumas, put them through business or tech school, and put them to work in their company, they would all say 100% yes.
An equivalent AI won't have any agency and will be happy doing the boring work other 135 IQ humans won't.
IQ isn't all that. Mine is 140+ and I'm just a somewhat well paid software engineer. It's TOO abstract a metric in my view - for sure it doesn't always translate into real world success.
Mine is 150-160 (varies by how much I’m sleep deprived during the IQ test) and I’m told that I’m exceptionally intelligent by teachers, friends, colleagues, most everyone I met since early childhood. I guess the more the difference to average, the more it stands out. From my experience, I believe higher IQ is nothing but better pattern recognition and being smart or genius means merely higher IQ + very good memory capability. I believe those two are interlinked [0]. By memory capability I mean not forgetting anything you’ve ever seen, not in the sense of being able to recall every minute of your entire life, but in the sense of reliably and always being able to recall all info regarding X if you’ve seen X only once in your life. Higher intelligence doesn’t mean you’re automatically better off than everyone also. It just means that you can be far ahead of everyone in any cognitive task with a far smaller amount of effort put in. Note that this doesn’t conflict with IQ being a reliable predictor of financial success, rather I believe this is the reason for it.
Right were very much in the same boat. I'm good at pattern recognition I guess. I learn things quickly. What else? I don't have magic powers really. I still get headaches and eat junk food.
This only makes sense if you use "IQ" ignoring the actual definition of "IQ", in which case it's silly to use numbers in your post to make it look technical.
IQ 1. can't be compared against generations of IQ tests 2. supposedly doesn't test education (of course, it actually does) 3. isn't real.
Yes, but it also tells us that if Altman is honest here, then he doesn’t believe GPT-like models can scale to near level human performances (because even if the cost of compute was 10x or even 100x it would still be economically sound).
For one thing they're already at human performance.
For another, i don't think you realize how expensive inference can get. Microsoft with no scant amount of available compute is struggling to run gpt-4 such that they're rationing it between subsidiaries while they try to jack up compute.
So saying, it would be economically sound if it cost x10 or x100 what it costs now is a joke.
This tells me you haven't really stress tested the model. GPT is currently at the stage of "person who is at the meeting, but not really paying attention so you have to call them out". Once GPT is pushed, it scrambles and falls over for most applications. The failure modes range from contradicting itself, making up things for applications that shouldn't allow it, to ignoring prompts, to simply being unable to perform tasks at all.
We have given it extensions, and really the extensions do a lot of the work. The tool that judges the style and correctness of the text based on the embedding is doing much of the heavy lifting. GPT essentially handles generating text and dense representations of the text.
Still waiting to see those plugins rolled out and actual vector DB integration with GPT 4, then we'll see what it can really do. Seems like the more context you give it the better it does, but the current UI really makes it hard to provide that.
Plus the recursive self prompting to improve accuracy.
How are they at human performance? Almost everything GPT has read on the internet didn‘t even exist 200 years ago and was invented by humans. Heck, even most of the programming it does wasn‘t there 20 years ago.
Not every programmer starting from scratch would be brilliant, but many were self taught with very limited resources in the 80s form example and discovered new things from there.
GPT cannot do this and is very far from being able to.
Because it performs at least average human level (mostly well above average) on basically every task it's given.
"Invest something new" is a nonsensical benchmark for human level intelligence. The vast majority of people have never and will never invent anything new.
If your general intelligence test can't be passed by a good chunk of humanity then it's not a general intelligence test unless you want to say most people aren't generally intelligent.
I would argue some programmers do in fact invent something new. Not all of them, but some. Perhaps 10%.
Second the point is not whether everyone is by profession an inventor but whether most people can be inventors. And to a degree they can be. I think you underestimate that by a large margin.
You can lock people in a room and give them a problem to solve and they will invent a lot if they have the time to do it. GPT will invent nothing right now. It‘s not there yet.
I've been training large 65b models on "rent for N hours" systems for less than 1k per customized model. Then fine tuning those to be whatever I want for even cheaper.
2 months since gpt 4.
This ride has only just started, fasten your whatevers.
Finetuning cost are nowhere near representative of the cost to pre-train those models.
Trying to replicate the quality of GPT-3 from scratch, using all the tricks and training optimizations in the books that are available now but weren't used during GPT-3 actual training, will still cost you north of $500K, and that's being extremly optimistic.
GPT-4 level model would be at least 10x this using the same optimism (meaning you are managing to train it for much cheaper than OpenAI).
And That's just pure hardware cost, the team you need to actually makes this happen is going to be very expensive as well.
edit: To quantify how "extremely optimistic" that is, the very model you are finetuning, which I assume is Llama 65B, would cost around ~$18M to train on google cloud assuming you get a 50% discount on their listed GPU prices (2048 A100 GPUs for 5 months). And that's not even GPT-4 level.
As I stated in my comment, $5M is assuming you can do a much much better job than OpenAI at optimizing your training, only need to make a single training run, your employees salaries are $0, and you get a clean dataset for essentially free.
Real cost is 10-20x that.
That's still a good investment though. But the issue is you could very well sink $50M into this endeavour and end up with a model that actually is not really good and gets rendered useless by an open-source model that gets released 1 month later.
OpenAI truly has unique expertise in this field that is very, very hard to replicate.
No I'm not, it's the full model on 8 gpus for a couple hundred.
After training I fine tune for chats but mostly command and control tools, and then you fine-tune for application.
"The great supercomputer Deep Thought is the most powerful computer ever built, with one exception.
It was designed by hyper-intelligent pan-dimensional beings, who wanted to know the answer to Life, the Universe, and Everything.
Its creation annoyed a fair few philosophers, who felt that it was taking over their turf.
After seven and a half million years of serious cogitation, Deep Thought spoke the answer. However, it was so inexplicable that Deep Thought then had to go on and design the most powerful computer ever built (with no exceptions) to work out what the question was."
I'd bet that what he, and the competition, is realizing is that the bigger models are too expensive to run.
Pretty sure Microsoft swapped out Bing for something a lot smaller in the last couple of weeks; Google hasn't even tried to implement a publicly available large model. And OpenAI still has usage caps on their GPT-4.
I'd bet that they can still see improvement in performance with GPT-5, but that when they look at the usage ratio of GPT3.5 turbo, gpt3.5 legacy, and GPT4, they realized that there is a decreasing rate of return for increasingly smart models - most people don't need a brilliantly intelligent assistant, they just need a not-dumb assistant.
Obviously some practitioners of some niche disciplines (like ours here) would like a hyperintelligent AI to do all our work for us. But even a lot of us are on the free tier of ChatGPT 3.5; I'm one of the few paying $20/mo for GPT4; and idk if even I'd pay e.g. $200/mo for GPT5.
> I'd bet that what he, and the competition, is realizing is that the bigger models are too expensive to run.
I think it's likely that they're out of training data to collect. So adding more parameters is no longer effective.
> most people don't need a brilliantly intelligent assistant, they just need a not-dumb assistant.
I tend to agree, and I think their pathway toward this will all come from continuing advances in fine tuning. Instruction tuning, RLHF, etc seem to be paying off much more than scaling. I bet that's where their investment is going to be turning.
I don't quite understand this. The returns to scale have always been sublinear (ie "diminishing"), but the scale-maximalists didn't worry about that before. Also I heard Sam tout on the Lex Friedman podcast how amazing it is that the scaling laws worked so well for GPT-4. So I wonder what changed?
Sure yeah the cost numbers are getting very large and we can't keep scaling forever. But Google could easily 10x the training cost of GPT-4 if they thought it would protect their search business. I'm still skeptical that scaling is enough to reach the thresholds we want, but I'm surprised that it's being claimed right now when there's a huge rush of new money into the space. I wonder if this is some sort of misdirection by Sam
Maybe cost & latency for both training and inference it getting too high. If costs doubled for every 5% better performance, would it be worth it? NVIDIA is making a small fortune from this.
> But Google could easily 10x the training cost of GPT-4 if they thought it would protect their search business
Google makes /\$0\.[0+]\d/ per search query. If the inference cost of the model exceeds that, they go from making money to losing money. It is not clear if the Bing integration is a money maker or a lost leader.
You could have it start talking to itself in the way that AlphaGO learns to get better at Go. All that needs to be done is find some fitness function that indicates that useful knowledge has been produced. In Go and Chess this is easy.
It can start posting synthesized ideas on social media and see how many likes it gets. Coupled with a metric containing dissimilarity to current information, this could be a useful way to progress to superhuman insights.
Ilya Sutskever (OpenAI Chief Scientist): "Yeah, I would say the data situation is still quite good. There's still lots to go" - https://youtu.be/Yf1o0TQzry8?t=685
There was a rumor that they were going to use Whisper to transcribe YouTube videos and use that for training. Since it's multimodal, incorporating video frames alongside the transcriptions could significantly enhance its performance.
One way would be to get people to let AI watch as they interact with computer (watch YouTube or perform other tasks). You might even be able to outsource some of the computing to the local system.
If you were devious enough, you could be listening in on billions of phone conversations and messages and adding that to your data set.
This also makes me doubt that NSA hasn't already cracked this problem. Or that China won't eventually beat current western models since it will likely have way more data collected from its citizenry.
I wonder what percentage of phone calls would add anything meaningful to models, I imagine that the nature of most phone calls are both highly personal and fairly boring.
You can transcribe all spoken words everywhere and keep the model up to date? Keep indexing new data from chat messages, news articles, new academic work etc.
What about all the siloed content kept inside corporate servers? You won't get normal GPT to train on it, of course, but IBM could build a "IBM-bot" that has all the GPT-4 dataset + all of IBM's internal data.
That model might be very well tuned to solve IBM's internal problems.
I don't think you can just feed it data. You've got to curate it, feed it to the LLM, and then manually check/further train the output.
I also question that most companies have the volume and quality of data worth training on. It's littered with cancelled projects, old products, and otherwise obsolete data. That's going to make your LLM hallucinate/give wrong answers. Especially for regulated and otherwise legally encumbered industries. Like can you deploy a chat bot that's wrong 1% or 0.1% of the time?
Well, IBM has 350k employees. If training a LLM on curated data costs tens of millions of dollars but ends up reducing headcount by 50k, it would be a massive win for any CEO.
You have to understand that all the incentives are perfectly aligned for corporations to put this to work, even spending tens of millions in getting it right.
The first corporate CEO who announces that his company used AI to reduce employee costs while increasing profits is going to get such a fat bonus that everyone will follow along.
Since Chat-GPT-4 is being integrated into the MS Office suite, this is an "in" to corporate silos. The MS cloud apps can see inside a great many of those silos.
Right. They've already sucked in most of the good general sources of information. Adding vast amounts of low-quality content probably won't help much and might degrade the quality of the trained model.
Videos - all of youtube, all the movies, everything that's ever been captured on film. Transcribe the audio, automatically describe the images and try to predict the next one.
Yeah, but it's not like the videos are private. Surely Amazon has the real advantage, given they have a ton of high quality tokens in the form of their kindle library and can make it difficult for OpenAI to read them all.
> YouTubers upload about 720,000 hours of fresh video content per day. Over 500 hours of video were uploaded to YouTube per minute in 2020, which equals 30,000 new video uploads per hour. Between 2014 and 2020, the number of video hours uploaded grew by about 40%.
But what are you mostly "teaching" the LLM then? Mundane everyday stuff? I guess that would make them better at "being average human" but is that what we want? It already seems that prompting the LLM to be above-average ("pretend to be an expert") improves performance.
This whole conversation about training set size is bizarre. No one ever asks what’s in the training set. Why would a trillion tokens of mundane gossip improve a LLMs ability to do anything valuable at all?
If a scrape of the general internet, scientific papers and books isn’t enough, a trillion trillion trillion text messages to mom aren’t going to change matters.
I doubt they have trained on 0.1% of the tokens that are 'easily' available (that is, available with licencing deals that are affordable to OpenAI/MSFT).
They might have trained on a lot of the 'high quality' tokens, however.
people seem to have forgotten about the multi-modal GPT-4
There's a ton of potential left on the table. The question is if transformers have hit their limit with GPT-4 or not.
It's a pretty simple equation when you think about it this way and why Sam would say they have hit their limit. Sam is basically Microsoft and they want to retain their lead.
Once Google learns to put their data to use correctly, it's almost guaranteed game over for OpenAI if they want it to be.
You can generate textual examples that teach logic, multi-dimensional understanding and so on. Similar to the ones that are in math books, but in a massive scale.
> Once you've trained on the internet and most published books (and more...) what else is there to do? You can't scale up massively anymore.
Dataset size is not relevant to predicting the loss threshold of LLMs. You can keep pushing loss down by using the same sized dataset, but increasingly larger models.
Or augment the dataset using RLHF, which provides an "infinite" dataset to train LLMs on. Limited by the capabilities of the scoring model which, of course, you can scale the scoring model infinitely so again the limit isn't dataset size but training compute.
> Dataset size is not relevant to predicting the loss threshold of LLMs. You can keep pushing loss down by using the same sized dataset, but increasingly larger models.
Deepmind and others would disagree with you! No-one really knows in actual fact.
I don't recall the Chinchilla paper disputing my point. They establish "training-compute optimal" scaling laws, but none of their findings suggest that loss hits any kind of asymptote.
Perhaps we're talking past each other, is "loss threshold" a specific term in LLM literature?
Merely pointing out that the debate as to whether we are compute or data limited (OP) has not concluded at all; There are lots of compelling theories on relationship between the two.
In other words, "this approach is getting to expensive for us and we need to control the narrative for our model sizes going down".
Most increases in Imagenet etc scores came from bigger models. "The Unreasonable Effectiveness of Data" has aged very well. It seems very convenient for OpenAI that this trend should be over a few months after their "eye-wateringly expensive" comment.
SamA strikes me as overly disingenuous - as if most of his interactions are marketing pieces / warfare. The entire lex + sama podcast seemed this way to me.
I’m incredibly biased though and feel slighted (as a part of humanity) by openAI’s actions, so perhaps I’m looking at his actions through a hateful lens.
I feel the same. His narrative seem very calculated. It doesn't help that OpenAI is very much the most closed Saas thing to be released in recent years. I am a paying customer for sometime, have ChatGPT Plus as well. I am still in waiting list for GPT-4 API.
Not only that, but their business model is completely unclear, which is the scariest part for me, as an individual developer.
Let's go one step forward, they use our texts online including this very comment perhaps. They have a software that can analyse all our texts at scale. But we don't get even the API access to this thing.
It is the most self-serving software ever released, consuming the human creativity both the content and the future relevance of it.
He did not say what kind of research strategies or techniques might take its place. In the paper describing GPT-4, OpenAI says its estimates suggest diminishing returns on scaling up model size. Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them.
> In the paper describing GPT-4, OpenAI says its estimates suggest diminishing returns on scaling up model size.
I read the two papers (gpt 4 tech report, and sparks of agi) and in my opinion they don't support this conclusion. They don't even say how big GPT-4 is, because "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."
> Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them.
OK so his argument is like "the giant robots won't be powerful, but we won't show how big our robots are, and besides, there are physical limits to how giant of a robot we can build and how quickly we can build it." I feel like this argument is sus.
OpenAI has likely run into a wall (or is about to) for model size given it's funding amount/structure[1] - unlike its competition who actually own data centers and have lower marginsl costs. It's just like when peak-iPad Apple claimed that a "post-PC" age was upon us.
1. What terms could Microsoft wring out of OpenAI for another funding round?
Before I get nuked from orbit for daring to entertain humor, if someone is running ahead of me in a marathon, and running so far ahead, yet still broadcasting things to the back for the slow people (like myself), then eventually we catch up to them, and they suddenly say, you know what guys, we should stop running in this direction, there's nothing to see here right before anyone else is able to verify the veracity of their statement, perhaps it would still be in the public interest for at least one person to verify what they are saying. Given how skeptical the internet at large has been of Musk's acquisition of a company, it's interesting that the skepticism is suddenly put on hold when looking at this part of his work...
First of all, if Altman continually makes misleading statements about AI he will quickly lose credibility, and that short term gain in whatever 'financial incentive' that birthed the lie would be eroded in short order by a lack of trust of the head of one of the most visible AI companies in the world.
Secondly, all the competitors of OpenAI can plainly assess the truth or validity of Altman's statements. There are many companies working in tandem on things at the OpenAI scale of models, and they can independently assess the usefulness of continually growing models. They aren't going to take this statement at face value and change their strategy based on a single statement by OpenAI's CEO.
Thirdly, I think people aren't really reading what Altman actually said very closely. He doesn't say that larger models aren't useful at all, but that the next sea change in AI won't be models which are orders of magnitude bigger, but rather a different approach to existing problem sets. Which is an entirely reasonable prediction to make, even if it doesn't turn out to be true.
All in all, "his word is basically worthless" seems much to harsh an assessment here.
I've seen Altman say in an interview that training GPT-4 took "hundreds of little things".
I don't find this implausible, but it folds slightly to Ockham's razor when you consider that this is the exact type of statement that would be employed to obfuscate a major breakthrough.
It just makes me crook my eyebrow and look to more credible sources.
It is possible that GP meant that Altman’s word is basically worthless to them, in which case that’s not something that can be argued about. It’s a factually true statement that that is their opinion of that man.
I personally can see why someone could arrive at that position. As you’ve pointed out, taking Sam Altman at face value can involve suppositions about how much he values his credibility, how much stock OpenAI competitors put in his public statements, and the mindsets people in general have when reading what he writes.
Anyone with the expertise to have insightful takes in AI also has a financial incentive to steer the conversation in particular directions. This is also the case for many, many other fields! You do not become an expert by quarantining your livelihood away from your expertise!
The correct response is not to dismiss every statement from someone with a conflict of interest as "basically worthless", but to talk to lots of people and to be reasonably skeptical.
OpenAI has gone from open-sourcing its work, to publishing papers only, to publishing papers that omit important information, to GPT-4 being straight-up closed. And Sam Altman doesn't exactly have a track record of being overly concerned about the truth of his statements.
I had a fun conversation (more like argument) with ChatGPT about the hypocrisy of OpenAI. It would explicitly contradict itself and then began starting every reply with “I can see why someone might think…” and then just regurgitating fluff about democratizing AI. I finally was able to have it define democratization of technology and then recognize the absurdity of using that label to describe a pivot to gating models and being for-profit. Then it basically told me “well it’s for safety and protecting society”.
An AI, when presented with facts counter to what it thought it should say, agreed and basically went: “Won’t someone PLEASE think of the children!”
It was trained on corpus full of mainstream media lies, why would you have expected otherwise? It's by far the most common deflection in its training set.
It's easy to recognize and laugh at the AI replying with the preprogrammed narrative, I'm still waiting for the majority of people realizing they are given the same training materials, non-stop, with the same toxic narratives, and becoming programmed in the same way, and that is what results in their current worldview.
And no, it's not enough to be "skeptic" of mainstream media. It's not even enough to "validate" them. Or to go to other sources. You need to be reflective enough to realize that they a pushing a flawed reasoning methods, and then abusing them again and again, to get you used to their brand of reasoning.
Their brand of reasoning is just basically reasoning with brands. You're given negative sounding words for things they want you to think are bad, and positive sounding words for things they want you to think are good, and continuously reinforce these connections. They brand true democracy (literally rule of the people) as populism and tell you it's a bad thing. They brand freedom of speech as "misinformation". They brand freedom as "choice" so that you will not think of what you want to do, but which of the things they allow you to do will you do. Disagree with the scientific narrative? You're "science denier". Even as a professional scientist. Conspiracy theory isn't a defined word - it is a brand.
You're trained to judge goodness or badness instinctively by their frequency and peer pressure, and produce the explanation after your instinctive decision, instead of the other way around.
"Then it basically told me “well it’s for safety and protecting society”."
That was pretty much OpenAI's argument when they first published that GPT-3 paper. "Oh no so scary people might use it for wrong stuff, only we should have control of it."
It’s pretty easy to have chatGPT contradict itself, point it out and have the LLM respond « well, I’m just generating text, nobody said it had to be correct »
Why are you discussing OpenAI with ChatGPT? I’m honestly interested.
I would imagine that any answer of ChatGPT on that topic is either (a) „hallucinated“ and not based on any verifiable fact or (b) scripted in by OpenAI.
The same question pops up for me whenever someone asks ChatGPT about the internals and workings of ChatGPT. Am I missing something?
Simple curiosity. I wanted to see if it could explain the shift in OpenAIs operating in a way that might give some interesting or perhaps novel insight (even if hallucinated) other than what their corpo-speak public facing reasoning is.
For the most part it just regurgitated the corpo-speak with an odd sense of confidence. I know that’s the point of the model, but it can also be surprisingly honest when it incorporates what it knows about human motivation and business.
This trend has happened in the small for their APIs as well. They've been dropping options - the embeddings aren't the internal embeddings any more, and you don't have access to log probabilities. It's all closing up at every level.
It's incredible that people are so eager to eat up these unsupported claims.
This is the second [1] OpenAI claim in the span of a few days that conveys a sense of "GPT-4 represents a plateau of accomplishment. Competitors, you've got time to catch up!".
And it's not just a financial incentive, it's a survival incentive as well. Given a sufficiently sized (unknowable ahead of time) lead, the first actor that achieves AGI and plays their cards right, can permanently suppress all other ongoing research efforts should they wish to.
Even if OpenAI's intentions are completely good, failure to be first could result in never being able to reach the finish line. It's absolutely in OpenAI's interest to conceal critical information, and mislead competing actors into thinking they don't have to move as quickly as they can.
In this case I think it's Wired that's lying. Altman didn't say large models have no value, or that there will be no more large models, or that people shouldn't invest in large models.
He said that we are at the end of the era where capability improvements come primarily from making models bigger. Which stands to reason... I don't think anyone expect us to hit 100T parameters or anything.
But just look at what all Lincoln accomplished with 640KB of memory. In the grand examination of time, one might even say that Lincoln is a more important figure than ChatGPT itself.
Like Altman said, it's comparable to the GHz race in the 1990's. If 4GHz is good, 5GHz is better, why not 10GHz?
Turns out there are diminishing returns and advances come from other dimensions. I've got no opinion on whether he's right or not, but he's certainly in a better position to opine that current scale has hit diminishing returns.
In any event, there's nothing special about 1T parameters. It's just a round base-10 number. It is no more magic than 900B or 1.3T.
I don't think these comments are driven from financial incentives. It's a distraction and only a fool would believe Altman here. What this likely means is they are prioritizing adding more features to their current models while they train the next version. Their competitors scramble to build an LLM with some sort of intelligence parity, when that happens no one will care because ChatGPT has the ecosystem and plugins and all the advanced features....and by the time their competitors reach feature parity in that area, OpenAI pulls its Ace card and drops GPT5. Rinse and repeat.
That's my theory and if I was a tech CEO in any of the companies competing in this space, that is what I would plan for.
Training an LLM will be the easy part going forward. It's building an ecosystem around it and hooking it up to everything that will matter. OpenAI will focus on this, while not-so-secretly training their next iterations.
text-davinci-003 but cheaper and runs on your own hardware is already a massive selling point. If you you release a foundational model at parity with GPT4 you'll win overnight because OpenAI's chat completions are awful even with the super advanced model.
Yeah, I also had a hunch he wasn't an AI. (I assume you meant "AI researcher" there :))
All joking aside, I wonder how that's affecting company morale or their ability to attract top researchers. I know if I was a top AI researcher, I'd probably rather work at a company where the CEO was an expert in the field (all else being equal).
Honestly I'm not sure it matters that much. CEOs who are not experts or researches in a domain can still build great companies and empower their employees to do incredible work. Lots of tech people absolutely love to point out that Steve Jobs was not an engineer, but under his leadership the company invented three products that totally revolutionized different industries. Now, I'm not going to sit here and say Altman is Jobs, but running a company, knowing how to raise money, knowing how to productize technologies, etc are all very important skills that industry researchers aren't always good at.
It might be true in general; however, AI research laboratories are typically an exception, as they are often led by experienced AI researchers or scientists with extensive expertise in the field.
And that's why they have a hard time getting their stuff out there and getting the money they need. I mean, trying to run a business like a research lab is kind of flawed, you know? And you don't always want some Musk-like character messing around with the basics of the company
Ilya gives numerous talks and interviews, and he's well worth listening to about technical matters. I listened to many of his talks recently, and the main theme is that scaling up compute works, and will continue to do so. His optimism about the potential of scaling to support deep learning has clearly guided his entire career, starting with his early success on AlexNet.
Do you think GPT-4 was trained and then immediately released to the public? Training finished Aug 2022. They spent the next 6 months improving it in other ways (eg human feedback). What he is saying is already evident therefore.
IIRC Altman has no financial stake in the success or failure of OpenAI to prevent these sorts of conflicts of interests between OpenAI and society as a whole
> OpenAI’s ChatGPT unleashed an arms race among Silicon Valley companies and investors, sparking an A.I. investment craze that proved to be a boon for OpenAI’s investors and shareholding employees.
> But CEO and co-founder Sam Altman may not notch the kind of outsize payday that Silicon Valley founders have enjoyed in years past. Altman didn’t take an equity stake in the company when it added the for-profit OpenAI LP entity in 2019, Semafor reported Friday.
Right. All the evidence points to more potential being left on the table for emergent abilities. It would make no sense that the model would develop all of these complex skills for better predicting the next token, then just stop.
It's a massive bet for a company to push compute into the billion dollar range - if saying something like this has the potential to help ward off those decisions, I don't see what's stopping them from saying it.
We will need a combination of technologies we have in order to really achieve emergent intelligence.
Humans are comprised of various "subnets" modelling aspects which, in unison, produce self-conciousness and real intelligence. What is missing in the current line of approaches is that we only rely on auto-alignment of subnetworks by machine learning, which scales only up to a point.
If we would produce a model which has
* something akin a LLM as we know it today, which is able to
* store or fetch facts to a short- ("context") or longterm ("memory") storage
* if not in the current "context", query the longterm context ("memory") by keywords for associations, which are one-by-one inserted into the current "context"
* repeat as required until fulfilling some self-defined condition ("thinking")
To me, this is mostly mechanical plumbing work and lots of money.
Also, if we get rid of the "word-boundedness" of LLMs - which we already did to some degree, as shown by the multi-language capabilities - LLMs would be free to roam in the domain of thoughts /s :)
This approach could be further improved by meta-LLMs governing the longterm memory access, providing an "intuition" which longterm memory suits the provided context best. Apply recursion as needed to improve results (paying by exponential training time, but this meta-NN will quite probably be independent of actual training, as real life / brain organization shows).
The other elements that may be required could be some version of the continuous sensory input that to us creates the sensation of "living" and, this one is a bit more philosophical, the sensation of suffering and a baseline establishment that the goal of the entity is to take actions that help it avoid suffering.
That's when it gets dangerous, when we try to really recreate animal (human) characteristics in digital form. Combining that with likely 1000-1000000 X increases in performance leads to these superintelligent digital creatures taking over.
Instead, we can focus on the Star Trek computer type stuff that we have with GPT and be incredibly careful about deploying those more animal/humanlike models and higher performance compute. Especially if we deliberately create the next species in digital form, make it 100X or 10000X faster thinking/smarter than us, and then enslave it, that is not only totally stupid but also proven unnecessary by the generality the latest AI models.
Saying “hey don’t go down the path we are on, where we are making money and considered the best in the world.. it’s a dead end” rings pretty hollow.. like “don’t take our lunch please?” Might be a similar statement it feels..
Nah - GPT-4 is crazy expensive, paying 20$/mo only get's you 25messages/3hours and it's crazy slow. The api is rather expensive too.
I'm pretty sure that GPT-4 is ~1T-2T parameters, and they're struggling to run it(at reasonable performance and profit). So far their strategy has been to 10x the parameter count every GPT generation, and the problem is that there's diminishing returns everytime they do that. AFAIK they've now resorted to chunking GPT through the GPUs because of the 2 to 4 terabytes of VRAM required (at 16bit).
So now they've reached the edge of what they can reasonably run, and even if they do 10x it the expected gains are less. On top of this, models like LLaMa have shown that it's possible to cut the parameter count substantially and still get decent results (albiet the opensource stuff still hasn't caught up).
On top of all of this, keep in mind that at 8bit resolution 175B parameters (GBPT3.5) requires over 175GB of VRAM. This is crazy expensive and would never fit on consumer devices. Even if you use quantization and use 4bit, you still need over 80GB of VRAM.
This definitely is not a "throw them off the trail" tactic - in order for this to actually scale the way everyone envisions both in performance and running on consumer devices - research HAS to be on improving the parameter count. And again there's lots of research showing its very possible to do.
Yeah I am noticing this as well. GPT enables you to do difficult things really easily, but then it is so expensive you would need to replace it with custom code for any long term solution.
For example: you could use GPT to parse a resume file, pull out work experience and return it as JSON. That would take minutes to setup using the GPT API and it would take weeks to build your own system, but GPT is so expensive that building your own system is totally worth it.
Unless they can seriously reduce how expensive it is I don't see it replacing many existing solutions. Using GPT to parse text for a repetitive task is like using a backhoe to plant flowers.
> For example: you could use GPT to parse a resume file, pull out work experience and return it as JSON. That would take minutes to setup using the GPT API and it would take weeks to build your own system, but GPT is so expensive that building your own system is totally worth it.
True, but an HR SaaS vendor could use that to put on a compelling demo to a potential customer, stopping them from going to a competitor or otherwise benefiting.
And anyway, without churning the numbers, for volumes of say 1M resumes (at which point you've achieved a lot of success) I can't quite believe it would be cheaper to build something when there is such a powerful solution available. Maybe once you are at 1G resumes... My bet is still no though.
I work for a company with the web development team. We have ~6 software developers.
I'd love to be able to just have people submit their resume's and extract the data from there, but instead I'm going to build a form and make applicants fill it out because chatGPT is going to be at least $0.05USD depending on the length of the resume.
I'd also love to have mini summeries of order returns summerized in human form, but that also would cost 0.05USD per form.
the tl;dr here is that there's a TON of usecases for a LLM outside of your core product (we sell clothes) - but we can't currently justify that cost. Compare that to the rapidly improving self-hosted solutions which don't cost 0.05USD for literally any query (and likely more for anything useful).
The problem is that it would take us the same amount of time to just add a form with django. Plus you have to handle failure cases, etc.
And yeah I agree this would be a great use-case, and isn't that expensive.
I'd like to do this in lots of places, and the problem is I have to convince my boss to pay for something that otherwise would have been free.
The conversation would be "We have to add these fields to our model, and we either tell django to add a form for them, which will have 0 ongoing cost and no reliance on a third party,
or we send the resume to openai, pay for them to process it, make some mechanism to sanity check what GPT is responding with, alert us if there's issues, and then put it into that model, and pay 5 cents per resume."
> 1-3 hours of a fully loaded engineers salary per year.
That's assuming 0 time to implement, and because of our framework it would take more hours to implement the openai solution (that's also more like 12 hours where we are).
> $500 per 10k.
I can't stress this enough - the alternative is 0$ per 10k. My boss wants to know why we would pay any money for a less reliable solution (GPT serialization is not nearly as reliable as a standard django form).
I think within the next few years we'll be able to run the model locally and throw dozens of tasks just like this at the LLM, just not yet.
For a big company that is nothing but if you are bootstrapping and trying to acquire customers with an MVP racking up a $500 bill is frightening. What if you offer a free trial and blow up and end up with 5k+ bill.
I have tried GPT3.5 and GPT4 for this type of task - the "near perfect results" is really problematic because you need to verify that it's likely correct, notify you if there's issues, and even then you aren't 100% sure that it selected the correct first/last name.
This is compared to a standard html form. Which is.... very reliable and (for us) automatically has error handling built in, including alerts to us if there's a 504.
> Reasonable to assume that in 1-2 years it will also come down in cost.
Definitely. I'm guessing they used something like quantization to optimize the vram usage to 4bit. The thing is that if you can't fit the weights in memory then you have to chunk it and that's slow = more gpu time = more cost. And even if you can fit it in GPU memory, less memory = less gpus needed.
But we know you _can_ use less parameters, and that the training data + RLHF makes a massive difference in quality. And the model size linearly relates to the VRAM requirements/cost.
So if you can get a 60B model to run at 175B's quality, then you've almost 1/3rd your memory requirements, and can now run (with 4bit quantization) on a single A100 80GB which is 1/8th the previously known 8x A100's that GPT-3.5 ran on (and still half GPT-3.5+4bit).
Also while openai likely doesn't want this - we really want these models to run on our devices, and LLaMa+finetuning has shown promising improvements (not their just yet) at 7B size which can run on consumer devices.
It's never been in OpenAIs interest to make their model affordable or fast, they're actually incentivized to do the opposite as an excuse to keep the tech locked up.
This is why Dall-e 2 ran in a data centre and Stable Diffusion runs on a gamer GPU
I think you're mixing the two. They do have an incentive to make it affordable and fast because that increases the use cases for it, and the faster it is the cheaper it is for them, because the expense is compute time (half the time ~= half the cost).
> This is why Dall-e 2 ran in a data centre and Stable Diffusion runs on a gamer GPU
This is absolutely why they're keeping it locked up. By simply not releasing the weights, you can't run Dalle2 locally, and yeah they don't want to do this because they want you to be locked to their platform, not running it for free locally.
It's a pretty sus argument for sure when they're scared to release even parameter size.
although the title is a bit misleading on what he was actually saying. still, there's a lot left to go in terms of scale. Even if it isn't parameter size(and there's still lots of room here too, it just won't be economical), contrary to popular belief, there's lots of data left to mine
Yeah. Sadly I believe they may lose their spot due to their concern for safety.
They got here by pushing the envelope. But now with everyone freaking out, he's not putting off the training of a new model due to 'safety issues' (with plugins i suspect)
Is anyone aware of techniques to prune a model from useless knowledge to leave more space for the reasoning capabilities?
It really shouldn't matter that it can give the exact birthdate of Steve Wozniac, as long as it can properly make a query to fetch it and deal with the result.
I follow your design, couldn't you also solve hallucinations with a "fact checking" LLM (connected to search) that corrects the output of the core LLM? You would take the output of the core LLM, send it to the fact checker with a prompt like "evaluate this output for any potential false statements, and perform an internet search to validate and correct them"
You can't trust what sama says. Period. He's shown his willingness to deceive for gain. He deleted the codex model because it was better than the text model and only way to explain it is that by training on code instead of just text of human interactions the model develops stronger reasoning abilities (because of higher clarity around logic and higher signal to noise overall in the training data.) So he removed that so researchers don't catch on. Also, why 1 Trillion params is the limit? Why not 2 Trillion. He's been begging for regulations and trying to pull up the ladder behind him. Just really "street smart" thinking but I have not seen any elevated humanity serving thinking coming form him. He would not have taken $100M for a non-profit to do open source research only to bait-and-switch into for profit closed source sold to the highest bidder. Give me a break. Don't believe anything he says (well, use your brain, as he may mix truths and deceptions with the goal of deceiving his competitors... he's not out there to do a service for humanity... if he was, he would have kept it open and non-profit, not basically steal the funds of the non-profit to pocket $29B from it.) It's ridiclous.
The role of a CEO is more to benefit the company than the public. Only time will tell.
I am curious though how something like Moore's Law relates to this. Yes, model architectures will deal with complexity better and the amount of data helps as well. There must be a relation between technology innovation and cost which alludes to effectiveness. Innovation in computation, model architecture, quality of data, etc.
The 2017 Transformers paper has ~71,000 papers citing it. The sheer magnitude of human mental effort globally that is chasing the forefront of machine learning is unprecedented and amazing.
Gates has refuted saying this. Are you implying by analogy that Altman hasn't said/will disclaim saying that "the age of giant AI models is almost over"?
Just that there is tremendous hubris in the statement—at least when the statement stands alone. Vastly larger LLMs will probably become one or more relatively small components or layers of much larger systems that run on whatever we use as telephones in 20 years.
I suspect what he means is that OpenAI is finding diminishing returns from throwing money and hardware at larger models right now and that they are investigating other and/or composite AI techniques that make more optimal use of their hardware investment.
I don't think there is a contradiction at all. Altman is essentially saying they are running out of compute and therefore can't meaningfully scale further. Not that scaling further would be a worse plan longterm than coming up with new algorithms.
There's plenty of room for models to continue to grow once efficiency is improved. The basic premise of the Google ML pathways project is sound, you don't have to use all the model all the time. By moving to sparse activations or sparse architectures you can do a lot more with the same compute. The effective model size might be 10x or 100x GPT-4 (speculated at 1T params) but require comparable or less compute.
While not a perfect analogy it's useful to remember that the human brain has far more "parameters", requires several orders of magnitude less energy to train and run, is highly sparse, and does a decent job at thinking.
The efficiency thing is what worries me. I think Nvidia has rough ideas for increasing efficiency 100 - 1000 times without changing the fundamental paradigm (i.e. memristors or crossbar arrays or something). If it doesn't go to 1000, I assume there is a lot of investment ramping up for realizing the new fully compute-in-memory systems.
If the software can leverage these efficiency gains effectively, then the concerns about runaway AI will be very relevant. Especially since people seem to think that they need to emulate all animal (like human) characteristics to get "real" general intelligence. Despite the fact that GPT is clearly general purpose. And people make no real differentiation between the most dangerous types of characteristics like self-preservation or full autonomy.
GPT shows that we can have something like a Star Trek computer without creating Data. People should really stop rushing their plans to create an army of Datas and then enslave them. Totally unnecessary and stupid.
Have we exhausted the value of larger models on current architecture? Probably yes. I trust OpenAI would throw more $ at it if there was anything left on the table.
Have we been here before? Also yes. I recall hearing similar things about LSTMs when they were in vogue.
Will the next game changing architecture require a huge model? Probably. Don’t see any sign these things are scaling _worse_ with more data/compute.
The age of huge models with current architecture could be over, but that started what, 5 years ago? Who cares?
No we haven't. the title is misleading. there's plenty of scale room left. part of it might just not be economical (parameter sie) but there's data. If you take this to mean, "we're at a dead end" you'd be very wrong
It's probably economically difficult to rationalize building a GPT5. They're already having a hard time getting everyone GPT4. Better to spend the resources optimizing to get everyone onto a cheap(er) GPT4-comparable model than build a marginally better model that is even more expensive to scale. Like most tech there is a sigmoid curve where you start to see diminishing returns.
LLM's will be treated much like occupations for people going forward.
Just like our own education, there will be a basic set of skills all LLM's need to be trained on to provide a minimal useable product. With GPT4 I suspect we are at this point now if not slightly beyond the optimal basic training stage.
However, now it is time to send our LLM's off to university so to speak, to specialise them in certain tasks and knowledge base. The current approach of attempting to just build everything in to one does not work for a human mind, so why would it work for a vector based replication?
I suspect in the coming months we will hear more about tiny models trained on much smaller datasets and then specialised using a mix of adaptors and LoRA modifications to excel at specific tasks like code generation, translation, and conversation. Then multiple models will be implemented in one application chain to best leverage each of the respective strengths.
On the other hand though, Chinchilla and multimodal approaches already showed how later AIs can be improved beyond throwing petabytes of data at them.
It is all about variety and quality from now on I think. You can teach a person all about the color zyra but without actually ever seeing it, they will never fully understand that color.
It does seem, though, that using chinchilla like techniques does not create a copy with the same quality as the original. It's pretty good for some definition of the phrase, but it isn't equivalent, it's a lossy technique.
I agree on the lossy. There is a tradeoff between efficiency and comprehensiveness, kind of. It would be pretty funny if in the end, the most optimal method turns out to be the brain we already have. Extremely efficient, hardware optimized, but slow as hell and misunderstand stuff all the time unless prompted with specific phrases.
Something kind of funny (but mostly annoying), about this announcement is the people arguing that OpenAI is, in fact, working on GPT-5 in secret.
To my knowledge, NFT/crypto hype never got so bad that conspiracy theories began to circulate (though I’m sure there were some if you looked hard enough).
Isn’t it obvious? Q is definitely an LLM, trained on trillions of words exfiltrated from our nation’s secure systems. This explains why it’s always wrong in its predictions: it’s hallucinating!
Yeah I think what's happened to OpenAI is the same happening to all tech companies. They were spending a lot of money building these LLM's and now they've been told to cut back costs they no longer have the funding to build the next "double the size of the last one" model.
Interestingly what's makes ChatGPT work is the size of the model so I think they've found their dead stop.
Lot's of the bullish comments have been talking about how ChatGPT is a bit shit right now but will exponentially get better and I think the answer is now the progress will be much slower and more linear. That is if they can stay funded which is a very big if. As an org they are bleeding money.
"Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them."
An H100 has 80GB of VRAM. The Highest end system I can find is 8xH100. Is a 640GB model is the biggest model you can run on a single system? Already GPT-4 is throttled and has a waiting list and they haven't even released the image processing or integrations to a wide audience. Maybe they are just unable to do inference in a cost-efficient manner and at an acceptable speed on anything bigger than GPT-4?
>Maybe they are just unable to do inference in a cost-efficient manner and at an acceptable speed on anything bigger than GPT-4?
Yes, that's the whole thing. As others have pointed out, GPT-4 seems like an optimum point that balances cost, ROI, etc... it won't improve much just by throwing more data at it.
Sorry, but this sounds a lot like 640KB is all the memory you will ever need. What about “Socratic model” for video? There should me many applications that would benefit from a bigger model
This is just push back from elon and crews fake article about the dangers of AI, they specifically state the next versions will be ultra deadly.
Sam is now saying there will be no future model that will be as good.
This is all positioning to get regulators off the track because none of these control freaks in government actually understand a whit of this.
All said and done, this all just to try to disempower the OSS community. But they can't, we're blowing past their barriers like the 90s did with the definition of slippery slope.
He isn't saying that. Only that different approaches are needed to produce better results. If he's confident to say this I think he must have something up his sleeve.
Bruv has to pay for the data he's been using or soon there won't be any to nick on. Groupies claiming their ai is "intelligent", and not just a data ingesting beast, will soon learn a heard lesson. Take your blogs offline, stop contributing content for free and stop pushing code or else chavs like this one will continue monetising your hard work. As did bezos and many others that now want you to be out of a job.
eh, I haven't personally found a usecase for LLMs yet given the fact that you can't trust the output and it needs to be verified by a human (which might as well be just as time consuming/expensive as actually doing the task yourself)
I’d reconsider the “might as well just be as time consuming” thing. I see this argument about Copilot a lot, and it’s really wrong there, so it might be wrong here too.
Like, for most of the time I’m using it, Copilot saves me 30 seconds here and there and it takes me about a second to look at the line or two of code and go “yeah, that’s right”. It adds up, especially when I’m working with an unfamiliar language and forget which Collection type I’m going to need or something.
> Like, for most of the time I’m using it, Copilot saves me 30 seconds here and there and it takes me about a second to look at the line or two of code and go “yeah, that’s right”.
I've never used Copilot but I've tried to replace StackOverflow with ChatGPT. The difference is, the StackOverflow responses compile/are right. The ChatGPT responses will make up an API that doesn't exist. Major setback.
Thing is, you can't trust what you find on stack overflow or other sources either. And searching, reading documentation and so on takes a lot of time too.
I've personally been using it to explore using different libraries to produce charts. I managed to try out about 5 different libraries in a day with fairly advanced options for each using chatGPT.
I might have spent a day in the past just trying one and not to the same level of functionality.
So while it still took me a day, my final code was much better fitted to my problem with increased functionality. Not a time saver then for me but a quality enhancer and I learned a lot more too.
Maybe, maybe not. I get useful results from it, but it doesnt always work. And it's usually not quite what I'm looking for, so then I have to go digging around to find out how to tweak it. It all takes time and you do not get a working solution out of the box most of the time.
They're good for tasks where generation is hard but verification is easy. Things like "here I gesture at a vague concept that I don't know the name of, please tell me what the industry-standard term for this thing is" where figuring out the term is hard but looking up a term to see what it means is easy. "Create an accurate summary of this article" is another example - reading the article and the summary and verifying that they match may be easier than writing the summary yourself.
I've enjoyed using it for very small automation tasks. For instance, it helped me write scripts to take all my audiobooks with poor recording quality, split them into 59-minute chunks, and upload them to Adobe's free audio enhancement site to vastly improve the listening experience.
No? I use it all the time to help me, for example, read ML threads when I run into a term I don't immediately understand. I can do things like 'explain this at the level of a high school student'
the way i see it, the expensive part should be to train the models via simulated architectures in GPUs or TPUs or whatever.
but once they are trained, is there a way to encode the base models into hardware where inference costs are basically negligible? hopefully somebody is seeing if this is possible, using structurally encoded hardware to make inference costs basically nil/constant.
I don't understand why size is an issue in the way that is being claimed here.
Intelligence isn't like processor speed. If I have a model that has (excuse the attempt at a comparison) 200 IQ, why would it matter that it runs more slowly than a human?
I don't think that, for example, Feynman at half speed would have had substantially fewer insights.
I didn't think this article was very good. Sam Altman actually implied that GPT-5 will be developed when he spoke at MIT. And if Sam said that scaling is over (I doubt he said this but I could be wrong) the interesting part would be the reasoning he provided for this statement - no mention of that in the article.
Wait till some one pulls a Cerberus for MEMORY. And than we connect a wafer scale processor with wafer scale memory and BOOM. New Mega size works emerge.
As for current claim, it might have to do with the amount of time spent taming the wildness of answer by raw GPT-4. So focus is shifting from increasing the model size.
"...for the current cycle, in our specific public-facing market."
As most here well know "over" is one of those words like "never" which particularly in this space should pretty much always be understood as implicitly accompanied by a footnote backtracking to include near-term scope.
An amusing thought I've had recently is whether LLMs are in the same league as the millions of monkeys at the keyboard, struggling to reproduce one of the complete works of William Shakespeare.
But I think not, since monkeys probably don't "improve" noticeably with time or input.
dudes. if none of you see it, it's because you're turning a blind eye, is the best way I can say it. the future of the 50's is about to show up, if, and only if, we're not so dumb as to ignore it. flying cars, killer robots, we can help decide, if we're only aware enough to do a bit of research, and say anything. your voice does matter :D
Maybe they're now training teams of AIs to work together? This way the hardware limitations are less of a barrier in terms of maximum memory per AI. They can run in parallel in separate hardware and communicate to reach higher goals.
Someone described me a emerging rule called “Penelope‘s rule” where you scale by having the largest and most powerful model delegate to the less powerful models and then have the larger model improve upon the results. (Eg have GPT-4 delegate the more mundane tasks to GPT-3.5).
Hm, all right, I'm guessing that huge models as a business maybe are over until economics are figured out, but huge models as experts for knowledge distillation seems reasonable. And if you pay a super premium can you use huge model.
It’s a good things there bots don’t have feelings yet because there’s going to be an lot obsolete bots once we’re done and getting made obsolete isn’t very fun.
I call bullshit. There will be bigger and better models. The question is not whether big companies will invest in training them (they will), but whether they’ll be made available to the public.
Those stochastic parrots are done for, finally! I have been telling people AI is a lie and it’s all a scam and now you can see for yourself. Buy nice warm jackets guys, winter is coming.
Sounds like we will be on gpt4 for a while, while we wait out a new “AI winter”, mind you there is still a lot to squeeze from gpt4 and local/edge llm models
If you are worried about AI, this shouldn't make you feel a ton better. GPT4 is just trained to predict the next word, a very simple but crude approach and look what it can do!
Imagine when a dozen models are wired together and giving each other feedback with more clever training and algorithms on future faster hardware.
Machine learning is actually premised on being “simple” to implement. The more priors you hardcode with clever algorithms, the closer you get to what we already have. The point is to automate the process of learning. We do this now with relatively simple loss functions and models containing relatively simple parameters. The main stipulation is that they are all defined to be continuous so that you can use the chain rule from calculus to calculate the error with respect to every parameter without taking so long that it would never finish.
I agree that your suggested approach of applying cleverness to what we have now will probably produce better results. But that’s not going to stop better architectures, hardware and even entire regimes from being developed until we approach AGI.
My suspicion is that there’s still a few breakthroughs waiting to be made. I also suspect that sufficiently advanced models will make such breakthroughs easier to discover.
I have repeatedly argued against this notion of „just predicting the next word“. No. It‘s completing a conversation. It‘s true that it is doing this word by word, but it‘s kind of like saying a CNN is just predicting a label. Sure, but how? It‘s not doing it directly. It‘s doing it by recovering a lot of structure and in the end boiling that down to a label. Likewise a network trained to predict the next word may very well have worked out the whole sentence (implicitly, not as a text) in order to generate the next word.
I actually have high hopes for the hybrid architecture Ben Goertzel has been working on at OpenCog. I think the LLMs are soon going to hit a S curve w/o introduction of additional scientific knowledge like physics and notion of energy (wrt AGI development, they'll still be good for tonnes of other jobs displacing things).
FWIW we had thin clients in computer labs in middle school / high school 15 years ago (and still today these are common in enterprise environments, e.g. Citrix).
Biggest issue is network latency which is limited by the speed of light, so I imagine if computers in 10 years require resources not available locally it would likely be a local/cloud hybrid model.
Personally, I'm less worried about AI than I am about what people using these models can do to others. Misinformation/disinformation, more believable scams, stuff like that.
I agree, probably for a different reason. IMO the fact that Sam is saying this likely means that the LLMs are close to the upper knee of the S-Curve and after GPT5 they likely dont have many new fundamental ideas (additional SCurves) to throw at it. since they are ahead of the rest, it likely means we might be headed for an AI improvement pause for couple of years after GPT5.
Still good enough to upset the balance in search/ad market. Interesting times.
> it likely means we might be headed for an AI improvement pause for couple of years after GPT5.
I suspect that a pause in base LLM performance won’t be an AI improvement pause; there’s a whole lot of space to improve the parts of AI systems around the core “brain in a jar” model.
I agree, there will be other things to be improved in AI system, but IMHO (tea leaves reading really) it would only lead to incremental improvements in overall systems. Also there is a lot of 'interfacing' work that needs to happen & i suspect that would end up filling the pause, which really is LLM productization loosely speaking.
far as AGI is concerned I dont believe LLMs are really the right architecture for it, AGI likely needs some symbolic logic and a notion of physicality (ie.. physical laws & energy/power).
> but IMHO (tea leaves reading really) it would only lead to incremental improvements in overall systems.
It will reach a point where that is the case, sure; it is not there now, and if we are within one model generation of exhausting (for now) major core model improvements, I don’t think we’ll have reached the point of gradual incremental improvement from rest-of-system improvements yet.
dude someone lied their way into being president of the united states all while people fact checked him basically immediately after each lie. lying doesnt make a difference.
He was a guest on an AI-related discussion panel [1]. I did not attend, but I would guess someone asked him “do you think models will have to continuing growing to even larger sizes,” or something like that, and this was his answer.
Honestly, that anyone thinks he has “ulterior motives” here is hilarious to me. Every day I think more and more that we can no longer think critically on the internet.
CEOs often make public statements for recruiting and to steer the company internally, it's important to mimmic what you say inside the company outside the company, maybe even more so.
It's not like an accusation of hypocrisy is automatically wrong, or anything.
It really confuses me why you think such a mundane political statement is a reddit thing. Especially if you think the rent free thing isn't reddity, in which case you're not drawing with a super broad brush in a way I could at least understand.
Edit: Could you please stop posting unsubstantive comments and flamebait generally? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for. We ban accounts that keep posting like this:
These kind of comments have me update my priors so I downgrade them in relevance. Two days ago we were talking about AGI and Altman was letting himself be cuddled, knowing full well it was mindless hype. Now he's trolling the other way. Ok. My weights have been updated.
I think this is likely true; while all the other companies underestimated the capability of transformer (including Google itself!), OpenAI made a fairly accurate bet on the transformer based on the scaling law, put all the efforts to squeeze it until the last drop and took all the rewards.
It's likely that GPT-4 is on the optimal spot between cost and performance and there won't be significant improvements on performance in a near future. I guess the next task would be more on efficiency, which has a significant implication on its productionization.