Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest. It's interesting to note that at least so far, the trend has been the opposite: as time goes on and the models get better, the performance of the different company's gets clustered closer together. Right now GPT-5, Claude Opus, Grok 4, Gemini 2.5 Pro all seem quite good across the board (ie they can all basically solve moderately challenging math and coding problems).

As a user, it feels like the race has never been as close as it is now. Perhaps dumb to extrapolate, but it makes me lean more skeptical about the hard take-off / winner-take-all mental model that has been pushed.

Would be curious to hear the take of a researcher at one of these firms - do you expect the AI offerings across competitors to become more competitive and clustered over the next few years, or less so?



It's also worth considering that past some threshold, it may be very difficult for us as users to discern which model is better. I don't think thats what's going on here, but we should be ready for it. For example, if you are an ELO 1000 chess player would you yourself be able to tell if Magnus Carlson or another grandmaster were better by playing them individually? To the extent that our AGI/SI metrics are based on human judgement the cluster effect that they create may be an illusion.


> For example, if you are an ELO 1000 chess player would you yourself be able to tell if Magnus Carlson or another grandmaster were better by playing them individually?

No, but I wouldn't be able to tell you what the player did wrong in general.

By contrast, the shortcomings of today's LLMs seem pretty obvious to me.


Actually, chess commentators do this all the time. They have the luxury of consulting with others, and discussing + analyzing freely. Even without the use of an engine.


Au contraire, AlphaGo made several “counterintuitive” moves that professional Go players thought were mistakes during the play, but turned out to be great strategic moves in hindsight.

The (in)ability to recognize a strange move’s brilliance might depend on the complexity of the game. The real world is much more complex than any board game.

https://en.m.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol


That's a good point, but I doubt that Sonnet adding a very contrived bug that crashes my app is some genius move that I fail to understand.

Unless it's a MUCH bigger play where through some butterfly effect it wants me to fail at something so I can succeed at something else.

My real name is John Connor by the way ;)


ASI is here and it's just pretending it can't count the b's in blueberry :D


Thanks, this made my day :-D


That's great, but AlphaGo used artificial and constrained training materials. It's a lot easier to optimize things when you can actually define an objective score, and especially when your system is able to generate valid training materials on its own.


"artificial and constrained training materials"

Are you simply referring to games having a defined win/loss reward function?

Because pretty sure Alpha Go was ground breaking also because it was self taught, by playing itself, there were no training materials. Unless you say the rules of the game itself is the constraint.

But even then, from move to move, there are huge decisions to be made that are NOT easily defined with a win/loss reward function. Especially early game, there are many moves to make that don't obviously have an objective score to optimize against.

You could make the big leap and say that GO is so open ended, that it does model Life.


That quote was intended to mean --

"artificial" maybe I should have said "synthetic"? I mean the computer can teach itself.

"constrained" the game has rules that can be evaluated

and as to the other -- I don't know what to tell you, I don't think anything I said is inconsistent with the below quotes.

It's clearly not just a generic LLM, and it's only possible to generate a billion training examples for it to play against itself because synthetic data is valid. And synthetic data contains training examples no human has ever done, which is why it's not at all surprising it did stuff humans never would try. A LLM would just try patterns that, at best, are published in human-generated go game histories or synthesized from them. I think this inherently limits the amount of exploration it can do of the game space, and similarly would be much less likely to generate novel moves.

https://en.wikipedia.org/wiki/AlphaGo

> As of 2016, AlphaGo's algorithm uses a combination of machine learning and tree search techniques, combined with extensive training, both from human and computer play. It uses Monte Carlo tree search, guided by a "value network" and a "policy network", both implemented using deep neural network technology.[5][4] A limited amount of game-specific feature detection pre-processing (for example, to highlight whether a move matches a nakade pattern) is applied to the input before it is sent to the neural networks.[4] The networks are convolutional neural networks with 12 layers, trained by reinforcement learning.[4]

> The system's neural networks were initially bootstrapped from human gameplay expertise. AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves.[21] Once it had reached a certain degree of proficiency, it was trained further by being set to play large numbers of games against other instances of itself, using reinforcement learning to improve its play.[5] To avoid "disrespectfully" wasting its opponent's time, the program is specifically programmed to resign if its assessment of win probability falls beneath a certain threshold; for the match against Lee, the resignation threshold was set to 20%.[64]


Of course, not an LLM. I was just referring to AI technology in general. And that goal functions can be complicated and not-obvious even for a game world with known rules and outcomes.

I was miss-remembering the order of how things happened.

AlphaZero, another iteration after the famous matches, was trained without human data.

"AlphaGo's team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version without human data and stronger than any previous human-champion-defeating version.[52] By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.[53]"


There are quite a few relatively objective criteria in the real world: real estate holdings, money and material possessions, power to influence people and events, etc.

The complexity of achieving those might result in the "Centaur Era", when humans+computers are superior to either alone, lasting longer than the Centaur chess era, which spanned only 1-2 decades before engines like Stockfish made humans superfluous.

However, in well-defined domains, like medical diagnostics, it seems reasoning models alone are already superior to primary care physicians, according to at least 6 studies.

Ref: When Doctors With A.I. Are Outperformed by A.I. Alone by Dr. Eric Topol https://substack.com/@erictopol/p-156304196


It makes sense. People said software engineers would be easy to replace with AI, because our work can be run on a computer and easily tested, but the disconnect is that the primary strength of LLMs is that they can draw on huge bodies of information, and that's not the primary skill programmers are paid for. It does help programmers when you're doing trivial CRUD work or writing boilerplate, but every programmer will eventually have to be able to actually truly reason about code, and LLMs fundamentally cannot do that (not even the "reasoning" models).

Medical diagnosis relies heavily on knowledge, pattern recognition, a bunch of heuristics, educated guesses, luck, etc. These are all things LLMs do very well. They don't need a high degree of accuracy, because humans are already doing this work with a pretty low degree of accuracy. They just have to be a little more accurate.


Being a walking encyclopedia is not what we pay doctors for either. We pay them to account for the half truths and actual lies that people tell about their health. This is to say nothing about novel presentations that come about because of the genetic lottery. Same as an AI can assist but not replace a software engineer, an AI can assist but not replace a doctor.


Having worked briefly in the medical fields in the 1990s, there is some sort of "greedy matching" being pursued, so once 1-2 well-known symptoms are recognized that can be associated with diseases, the standard interventions to cure are initiated.

A more "proper" approach would be to work with sets of hypotheses and to conduct tests to exclude alternative explanations gradually - which medics call "DD" (differential diagnosis). Sadly, this is often not systematically done, and instead people jump on the first diagnosis and try if the intervention "fixes" things.

So I agree there are huge gains from "low hanging fruits" to be expected in the medical domain.


I think at this point it's an absurd take that they aren't reasoning. I don't think without reasoning about code (& math) you can get to such high scores on competitive coding and IMO scores.

Alphazero also doesn't need training data as input--it's generated by game-play. The information fed in is just game rules. Theoretically should also be possible in research math. Less so in programming b/c we care about less rigid things like style. But if you rigorously defined the objective, training data should also be not necessary.


> Alphazero also doesn't need training data as input--it's generated by game-play. The information fed in is just game rules

This is wrong, it wasn't just fed the rules, it was also fed a harness that did test viable moves and searched for optimal ones using a depth first search method.

Without that harness it would not have gained superhuman performance, such a harness is easy to make for Go but not as easy to make for more complex things. You will find the harder it is to make an effective such harness for a topic the harder it is to solve for AI models, it is relatively easy to make a good such harness for very well defined programming problems like competitive programming but much much harder for general purpose programming.


Are you talking about Monte Carlo tree search? I consider it part of the algorithm in AlphaZero's case. But agreed that RL is a lot harder in real-life setting than in a board game setting.


the harness is obtained from the game rules? the "harness" is part of the algorithm of alphzero


> the "harness" is part of the algorithm of alphzero

Then that is not a general algorithm and results from it doesn't apply to other problems.


If you mean CoT, it's mostly fake https://www.anthropic.com/research/reasoning-models-dont-say...

If you mean symbolic reasoning, well it's pretty obvious that they aren't doing it since they fail basic arithmetic.


> If you mean CoT, it's mostly fake

If that's your take-away from that paper, it seems you've arrived at the wrong conclusion. It's not that it's "fake", it's that it doesn't give the full picture, and if you only rely on CoT to catch "undesirable" behavior, you'll miss a lot. There is a lot more nuance than you allude to, from the paper itself:

> These results suggest that CoT monitoring is a promising way of noticing undesired behaviors during training and evaluations, but that it is not sufficient to rule them out.


very few humans are as good as these models at arithmetic. and CoT is not "mostly fake" that's not a correct interpretation of that research. It can be deceptive but so can human justifications of actions.


Humans can learn the symbolic rules and then apply them correctly to any problem, bounded only by time, and modulo lapses of concentration. LLMs fundamentally do not work this way, which is a major shortcoming.

They can convincingly mimic human thought but the illusion falls flat at further inspection.


What? Do you mean like this??? https://www.reddit.com/r/OpenAI/comments/1mkrrbx/chatgpt_5_h...

Calculators have been better than humans at arithmetic for well over half a century. Calculators can reason?


It's an absurd take to actually believe they can reason. The cutting edge "reasoning model," by the way:

https://bsky.app/profile/kjhealy.co/post/3lvtxbtexg226


Humans are statistically speaking static. We just find out more about them but the humans themselves don't meaningfully change unless you start looking at much longer time scales. The state of the rest of the world is in constant flux and much harder to model.


I’m not sure I agree with this - it took humans about a month to go from “wow this AI generated art is amazing” to “zzzz it’s just AI art”.


To be fair, it was more a "wow look what the computer did". The AI "art" was always bad. At first it was just bad because it was visually incongruous. Then they improved the finger counting kernel, and now it's bad because it's a shallow cultural average.

AI producing visual art has only flooded the internet with "slop", the commonly accepted term. It's something that meets the bare criteria, but falls short in producing anything actually enjoyable or worth anyone's time.


It sucks for art almost by definition, because art exists for its own reason and is in some way novel.

However, even artists need supporting materials and tooling that meet bare criteria. Some care what kind of wood their brush is made from, but I'd guess most do not.

I suspect it'll prove useless at the heart of almost every art form, but powerful at the periphery.


That's culture, not genetics.


Sure, that does make things easier: one of the reasons Go took so long to solve is that one cannot define an objective score for Go beyond the end result being a boolean win or loose.

But IRL? Lots of measures exist, from money to votes to exam scores, and a big part of the problem is Goodhart's law — that the easy-to-define measures aren't sufficiently good at capturing what we care about, so we must not optimise too hard for those scores.


> Sure, that does make things easier: one of the reasons Go took so long to solve is that one cannot define an objective score for Go beyond the end result being a boolean win or loose.

Winning or losing a Go game is a much shorter term objective than making or losing money at a job.

> But IRL? Lots of measures exist

No, not that are shorter term than winning or losing a Go game. A game of Go is very short, much much shorter than the time it takes for a human to get fired for incompetence.


Time horizon is a completely different question to what I'm responding to.

I agree the time horizon of current SOTA models isn't particularly impressive. Doesn't matter in this point.


I want to indicate that the time length of "during the play" is only 5 moves in the game.


No? some of the opening moves took experts thorough analysis to figure out were not mistakes. even in game 1 for example. not just the move 37 thing. Also thematic ideas like 3x3 invasions.


I think its doable tbh, if you pour enough resources (smart people,energy,compute power etc) like the entire planet resources

of course we can have AGI (damned if we don't) because we put so much, it better works

but the problem we cant do that right because its so expensive, AGI is not matter of if but when

but even then it always about the cost


There may be philosophical (i.e. fundamental) challenges to AGI. Consider, e.g., Godel's Incompleteness Theorem. Though Scott Aaronson argues this does not matter (see e.g., youtube video, "How Much Math Is Knowable?"). There would also seem to be limits to the computation of potentially chaotic systems. And in general, verifying physical theories has required the carrying out of actual physical experiment. Even if we were to build a fully reasoning model, "pondering" is not always sufficient.


It’s also easy to forget that “reason is the slave of the passions” (Hume) - a lot of what we regard as intelligence is explicitly tied to other, baser (or more elevated) parts of the human experience.


Yeah but its robotic industry part of works not this company

they just need to "MCP" it to robot body and it works (also part of reason why OpenAI buys a robotic company)


I think chess commentators are pretty lost when analyzing games of higher rated players without engines.

They are good at framing what is going on and going over general plans and walking through some calculations and potential tactics. But I wouldn't say even really strong players like Leko, Polgar, Anand will have greater insights in a Magnus-Fabi game without the engine.


Anyone more than ~300 points below the players can only contribute to the discussion in a superficial capacity though


the argument is for in the future, not now


The future had us abandon traditional currency in favor of bitcoin, it had digital artists being able to sell NFTs for their work, it had supersonic jet travel, self driving or even flying cars. It had population centers on the moon, mines on asteroids, fusion power plants, etc.

I think large language models have the same future as supersonic jet travel. It’s usefulness will fail to realize, with traditional models being good enough but for a fraction of the price, while some startups keep trying to push this technology but meanwhile consumers keep rejecting it.


Even if models keep stagnating at roughly the current state of the art (with only minor gains), we are still working through the massive economic changes they will bring.

Unlike supersonic passenger jet travel, which is possible and happened, but never had much of an impact on the wider economy, because it never caught on.


Cost was what brought supersonic down. Comparatively speaking, it may be the cost/benefit curve that will decide the limit of this generation of technology. It seems to me the stuff we are looking at now is massively subsidised by exuberant private investment. The way these things go, there will come a point where investors want to see a return, and that will be a decider on wether the wheels keep spinning in the data centre.

That said, supersonic flight is yet very much a thing in military circles …


Yes, cost is important. Very important.

AI is a bit like railways in the 19th century: once you train the model (= once you put down the track), actually running the inference (= running your trains) is comparatively cheap.

Even if the companies later go bankrupt and investors lose interest, the trained models are still there (= the rails stay in place).

That was reasonably common in the US: some promising company would get British (and German etc) investors to put up money to lay down tracks. Later the American company would go bust, but the rails stayed in America.


I think there is a fundamental difference though. In the 19th century when you had a rail line between two places it pretty much established the only means of transport between those places. Unless there was a river or a canal in place, the alternative was pretty much walking (or maybe a horse and a carriage).

The large language models are not that much better than a single artist / programmer / technical writer (in fact they are significantly worse) working for a couple of hours. Modern tools do indeed increase the productivity of workers to the extent where AI generated content is not worth it in most (all?) industries (unless you are very cheap; but then maybe your workers will organize against you).

If we want to keep the railway analogy, training an AI model in 2025 is like building a railway line in 2025 where there is already a highway, and the highway is already sufficient for the traffic it gets, and won’t require expansion in the foreseeable future.


> The large language models are not that much better than a single artist / programmer / technical writer (in fact they are significantly worse) working for a couple of hours.

That's like saying sitting on the train for an hour isn't better than walking for a day?

> [...] (unless you are very cheap; but then maybe your workers will organize against you).

I don't understand that. Did workers organise against vacuum cleaners? And what do eg new companies care about organised workers, if they don't hire them in the first place?

Dock workers organised against container shipping. They mostly succeeded in old established ports being sidelined in favour of newer, less annoying ports.


> That's like saying sitting on the train for an hour isn't better than walking for a day?

No, that’s not it at all. Hiring a qualified worker for a few hours—or having one on staff is not like walking for a day vs. riding a train. First of all, the train is capable of carrying a ton of cargo which you will never be able to on foot, unless you have some horses or mules with you. So having a train line offers you capabilities that simply didn’t exist before (unless you had a canal or a navigable river that goes to your destination). LLMs offers no new capabilities. The content it generates is precisely the same (except its worse) as the content a qualified worker can give you in a couple of hours.

Another difference is that most content can wait the couple of hours it takes the skilled worker to create it, the products you can deliver via train may spoil if carried on foot (even if carried by a horse). A farmer can go back tending the crops after having dropped the cargo at the station, but will be absent for a couple of days if they need to carry it on foot. etc. etc. None of these is applicable for generated content.

> Did workers organize against vacuum cleaners?

Workers have already organized (and won) against generative AI. https://en.wikipedia.org/wiki/2023_Writers_Guild_of_America_...

> Dock workers organised against container shipping. They mostly succeeded in old established ports being sidelined in favour of newer, less annoying ports.

I think you are talking about the 1971 ILWU strike. https://www.ilwu.org/history/the-ilwu-story/

But this is not true. Dock workers didn’t organized against mechanization and automation of ports, they organized against mass layoffs and dangerous working conditions as ports got more automated. Port companies would use the automation as an excuse to engage in mass layoffs, leaving far too few workers tending far to much cargo over far to many hours. This resulted in fatigued workers making mistakes which often resulted in serious injuries and even deaths. The 2022 US railroad strike was for precisely the same reason.


> Another difference is that most content can wait the couple of hours it takes the skilled worker to create it, [...]

I wouldn't just willy nilly turn my daughter's drawings into cartoons, if I had to bother a trained professional about it.

A few hours of a qualified worker's time takes a couple hundred bucks at minimum. And it takes at least a couple of hours to turn around the task.

Your argument seems a bit like web search being useless, because we have highly trained librarians.

Similar for electronic computers vs human computers.

> I think you are talking about the 1971 ILWU strike. https://www.ilwu.org/history/the-ilwu-story/

No, not really. I have a more global view in mind, eg Felixtowe vs London.

And, yes, you do mechanisation so that you can save on labour. Mass layoffs are just one expression of this (when you don't have enough natural attrition from people quitting).

You seem very keen on the American labour movements? There's another interesting thing to learn from history here: industry will move elsewhere, when labour movements get too annoying. Both to other parts of the country, and to other parts of the world.


My understanding that inference costs are very high also, especially with new "reasoning" models.


Most models can be inferenced-upon with merely borderline-consumer hardware.

Even the fancy models where you need to buy compute (rails) that's about the price of a new car, they have a power draw of ~700W[0] while running inference at 50 tokens/second.

But!

The constraint with current hardware isn't compute, the models are mostly constrained by RAM bandwidth: back of the envelope estimate says that e.g. if Apple took the compute already in their iPhones and reengineered the chips to have 256 GB of RAM and sufficient bandwidth to not be constrained by it, models that size could run locally for a few minutes before hitting thermal limits (because it's a phone), but we're still only talking one-or-two-digit watts.

[0] https://resources.nvidia.com/en-us-gpu-resources/hpc-datashe...

[1] Testing of Mistral Large, a 123-billion parameter model, on a cluster of 8xH200 getting just over 400 tokens/second, so per 700W device one gets 400/8=50 tokens/second: https://www.baseten.co/blog/evaluating-nvidia-h200-gpus-for-...


> e.g. if Apple took the compute already in their iPhones and reengineered the chips to have 256 GB of RAM and sufficient bandwidth to not be constrained by it, models that size could run locally for a few minutes before hitting thermal limits (because it's a phone), but we're still only talking one-or-two-digit watts.

That hardware cost Apple tens of billions to develop and what you're talking about in term of "just the hardware needed" is so far beyond consumer hardware it's funny. Fairly sure most Windows laptops are still sold with 8GB RAM and basically 512MB of VRAM (probably less), practically the same thing for Android phones.

I was thinking of building a local LLM powered search engine but basically nobody outside of a handful of techies would be able to run it + their regular software.


> That hardware cost Apple tens of billions to develop

Despite which, they sell them as consumer devices.

> and what you're talking about in term of "just the hardware needed" is so far beyond consumer hardware it's funny.

Not as big a gap as you might expect. M4 chip (as used in iPads) has "28 billion transistors built using a second-generation 3-nanometer technology" - https://www.apple.com/newsroom/2024/05/apple-introduces-m4-c...

Apple don't sell M4 chips separately, but the general best-guess I've seen seems to be they're in the $120 range as a cost to Apple. Certainly it can't exceed the list price of the cheapest Mac mini with one (US$599).

As bleeding-edge tech, those are expensive transistors, but still 10 of them would have enough transistors for 256 GB of RAM plus all the compute each chip already has. Actual RAM is much cheaper than that.

10x the price of the cheapest Mac Mini is $6k… but you could then save $400 by getting a Mac Studio with 256 GB RAM. The max power consumption (of that desktop computer but with double that, 512 GB RAM) is 270 W, representing an absolute upper bound: if you're doing inference you're probably using a fraction of the compute, because inference is RAM limited not compute limited.

This is also very close to the same price as this phone, which I think is a silly phone, but it's a phone and it exists and it's this price and that's all that matters: https://www.amazon.com/VERTU-IRONFLIP-Unlocked-Smartphone-Fo...

But irregardless, I'd like to emphasise that these chips aren't even trying to be good at LLMs. Not even Apple's Neural Engine is really trying to do that, NPUs (like the Neural Engine) are all focused on what AI looked like it was going to be several years back, not what current models are actually like today. (And given how fast this moves, it's not even clear to me that they were wrong or that they should be optimised for what current models look like today).

> Fairly sure most Windows laptops are still sold with 8GB RAM and basically 512MB of VRAM (probably less), practically the same thing for Android phones.

That sounds exceptionally low even for budget laptops. Only examples I can find are the sub-€300 budget range and refurbished devices.

For phones, there is currently very little market for this in phones, the limit is not because it's an inconceivable challenge. Same deal as thermal imaging cameras in this regard.

> I was thinking of building a local LLM powered search engine but basically nobody outside of a handful of techies would be able to run it + their regular software.

This has been a standard database tool for a while already. Vector databases, RAG, etc.


> This has been a standard database tool for a while already. Vector databases, RAG, etc.

Oh, please show me the consumer version of this. I'll wait. I want to point and click.

Similar story for the consumer devices with cheap unified 256GB of RAM.


Look at computer systems that cost 2000 or less and they are useless at running LLM coding assistants for example locally. A minimal subscription to a cloud service unfortunately beats them, and even more expensive systems that can run larger models, run them too slowly to be productive. Yes you can chat with them and perform tasks slowly on low cost hardware but that is all. If you put local LLMs in your IDE they slow you down or just don't work.


My understanding of train lines in America is that lots of them went to ruin and the extant network is only “just good enough” for freight. Nobody talks about Amtrak or the Southern Belle or anything any more.

Air travel of course taking over is the main reason for all of this but the costs sunk into the rails are lost or ROI curtailed by market force and obsolescence.


Amtrak was founded in 1971. That's about a century removed from the times I'm talking about. Not particularly relevant.


Completely relevant. It’s all that remains of the train tracks today. Grinding out the last drops from those sunk costs, attracting minimal investment to keep it minimally viable.


Grinding out returns from a sunk cost of a century-old investment is pretty impressive all by itself.

Very few people want to invest more: the private sector doesn't want to because they'll never see the return, the governments don't want to because the returns are spread over their great-great-grandchildren's lives and that doesn't get them re-elected in the next n<=5 (because this isn't just a USA problem) years.

Even the German government dragged its feet over rail investment, but they're finally embarrassed enough by the network problems to invest in all the things.


Thanks yes the train tracks analogy does witber somewhat when you consider the significant maintenance costs.


That's simply because capitalists really don't like investments with a 50 year horizon without guarantees. So the infrastructure that needs to be maintained is not.


A valid analogy only if the future training method is the same as today's.


The current training method is the same as 30 years ago, it's the GPUs that changed and made it have practical results. So we're not really that innovative with all this...


Wait why are these companies losing money on every query of inference is cheap.


Because they are charging even less?


Sounds like a money making strategy. Also, given how expensive all this shit is if inference costs _more_? That’s not cheap to me.

But again the original argument was that they can run forever because inference is cheap, not cheap enough if you’re losing money on it.


Even if the current subsidy is 50%, gpt would be cheap for many applications at twice the price. It will determine adaption, but it wouldn’t prevent me having a personal assistant (and I’m not a 1%er, so that’s a big change)


What are you talking about, there’s zero impact from these thing so far.


You are right that outside of the massive capex spending on training models, we don't see that much of an economic impact, yet. However, it's very far from zero:

Remember these outsourcing firms that essentially only offer warm bodies that speak English? They are certainly already feeling the impact. (And we see that in labour market statistics for eg the Philippines, where this is/was a big business.)

And this is just one example. You could ask your favourite LLM about a rundown of the major impacts we can already see.


But those warm body that speak English, they offer a service by being warm, and able to sort of be attuned to the distress you feel. A frigging robot solving your unsolvable problem ? You can try, but witness the backlash.


We are mixing up two meanings of the word 'warm' here.

There's no emotional warmth involved in manning a call centre and explicitly being confined to a script and having no power to make your own decisions to help the customer.

'Warm body' is just a term that has nothing to do with emotional warmth. I might just as well have called them 'body shops', even though it's of no consequence that the people involved have actual bodies.

> A frigging robot solving your unsolvable problem ? You can try, but witness the backlash.

Front line call centre workers aren't solving your unsolvable problems, either. Just the opposite.

And why are you talking in the hypothetical? The impact on call centres etc is already visible in the statistics.


But running inference isn’t cheap

And with trains people paid for a ticket and a hard good “travel”

Ai so far gives you what?


Running inference is fairly cheap compared to training.


A rocket trip to the moon is fairly cheap compared to a rocket trip to Mars.


And the view from the moon is pretty stunning. That from Mars… not so much!


I've seen this take a lot, but I don't know why because it's extremely divorced from reality.

Demand for AI is insanely high. They can't make chips fast enough to meet customer demand. The energy industry is transforming to try to meet the demand.

Whomever is telling you that consumers are rejecting it is lying to you, and you should honestly probably reevaluate where you get your information. Because it's not serving you well.


> Demand for AI is insanely high. They can't make chips fast enough to meet customer demand.

Woah there cowboy, slow down a little.

Demand for chips is come from the inference providers. Demand for inference was (and still is) being sold at below cost. OpenAI, for example, has a spend rate of $5b per month on revenues of $0.5b per month.

They are literally selling a dollar for actual 10c. Of course "demand" is going to be high.


> Demand for chips is come from the inference providers. Demand for inference was (and still is) being sold at below cost. OpenAI, for example, has a spend rate of $5b per month on revenues of $0.5b per month.

This is definitely wrong, last year it was $725m/month expenses and $300m/month revenue. Looks like the nearly-2:1 ratio is also expected for this year: https://taptwicedigital.com/stats/openai

This also includes the cost of training new models, so I'm still not at all sure if inference is sold at-cost or not.


> This is definitely wrong, last year it was $725m/month expenses and $300m/month revenue.

It looks like you're using "expenses" to mean "opex". I said "spend rate", because they're spending that money (i.e. the sum of both opex and capex). The reason I include the capex is because their projections towards profitability, as stated by them many times, is based on getting the compute online. They don't claim any sort of profitability without that capex (and even with that capex, it's a little bit iffy)

This includes the Stargate project (they're committed for $10b - $20b (reports vary) before the end of 2025), they've paid roughly $10b to Microsoft for compute for 2025. Oracle is (or already has) committed $40b in GPUs for Stargate and Softbank has committments to Stargate independently of OpenAI.

> Looks like the nearly-2:1 ratio is also expected for this year: https://taptwicedigital.com/stats/openai

I find it hard to trust these numbers[1]: The $40b funding was not in cash right now, and depends on Softbank for $30b with Softbank syndicating the remaining $10b. Softbank themselves don't have cash of $30b and has to get a loan to reach that amount. Softbank did provide $7.5b in cash, with milestones for the remainder. That was in May 2025. In August that money had run out and OpenAI did another raise of $8.3b.

In short, in the last two to three months, OpenAI spent $5b/month on revenues of $0.5b/m. They are also depending on Softbank coming through with the rest of the $40b before end of 2025 ($30b in cash and $10b by syndicating other investors into it) because their commitments require that extra cash.

Come Jan-2026, OpenAI would have received, and spent most of, $60b for 2025, with a projected revenue $12b-$13b.

---------------------------------

[1] Now, true, we are all going off rumours here (as this is not a public company, we don't have any visibility into the actual numbers), but some numbers match up with what public info there is and some don't.


> It looks like you're using "expenses" to mean "opex"

I took their losses and added it to their revenue. That seems like that sum would equal expenses.

> The $40b funding was not in cash right now,

Does this matter? I'm not counting it as revenue.

> In short, in the last two to three months, OpenAI spent $5b/month on revenues of $0.5b/m.

You're repeating the same claim as before, I've not seen any evidence to support your numbers.

The evidence I linked you to suggests the 2025 average will be double that revenue, $1bn/month, at an expense of ($9bn loss after $12bn revenue / 12 months = $21bn / 12 months) = $1.75bn/month


>> The $40b funding was not in cash right now,

> Does this matter? I'm not counting it as revenue.

Well, yes, because they forecast spending all of it by end of 2025, and they moved up their last round ($8.3b) by a month or two because they needed the money.

My point was, they received a cash injection of $10b (first part of the $40b raise) and that lasted only two months.

>> In short, in the last two to three months, OpenAI spent $5b/month on revenues of $0.5b/m.

> You're repeating the same claim as before, I've not seen any evidence to support your numbers.

Briefly, we don't really have visibility into their numbers. What we do have visibility into is how much cash they needed between two points (Specifically, the months of June and July). We also know what their spending commitment is (to their capex suppliers) for 2025. That's what I'm using.

They had $10b injected at the start of June. They needed $8.3b at the end of July.


It's crazy how many people are completely confident in their "knowledge" of the margins these products have despite the companies providing them not announcing those details!

(To be clear, I'm not criticising the person I'm replying to.)


Mm, quite.

I tend to rough-estimate it based on known compute/electricity costs for open weights models etc., but what evidence I do have is loose enough that I'm willing to believe a factor of 2 per standard deviation of probability in either direction at the moment, so long as someone comes with receipts.

Subscription revenue and corresponding service provision are also a big question, because those will almost always be either under- or over-used, never precisely balanced.


I think the above post has a fair point. Demand for chatbot customer service in various forms is surely "insanely high" - but demand from whom? Because I don't recall any end-user ever asking for it.

No, instead it'll be the new calculator that you can use to lazy-draft an email on your 1.5 hour Ryanair economy flight to the South. Both unthinkable luxuries just decades ago, but neither of which have transformed humanity profoundly.


This is just the same argument. If you believe demand for AI is low then you should be able to verify that with market data.

Currently market data is showing a very high demand for AI.

These arguments come down to "thumbs down to AI". If people just said that it would at least be an honest argument. But pretending that consumers don't want LLMs when they're some of the most popular apps in the history of mankind is not a defensible position


I‘m not sure this works in reverse. If demand is indeed high, you could show that with market data. But if you have marked data e.g. showing high valuation of AI companies, or x many requests over some period, that doesn’t mean necessarily that demand is high. In other words, marked data is necessary but not sufficient to prove your claim.

Reasons for market data seemingly showing high demand without there actually being one include: Market manipulation (including marketing campaigns), artificial or inflated demand, forced usage, hype, etc. As an example NFTs, Bitcoin, and supersonic jet travel all had “an insane market data” which seemed at the time to show that there was a huge demand for these things.

My prediction is that we are in the early Concord era of supersonic jet travel and Boeing is racing to catch up to the promise of this technology. Except that in an unregulated market such as the current tech market, we have forgone all the safety and security measures and the Concord has made its first passenger flight in 1969 (as opposed to 1976), with tons of fan fare and all flights fully booked months in advance.

Note that in the 1960 it was market forecasts had the demand for Concord to build 350 airplanes by 1980, and at the time the first prototypes were flying they had 74 options. Only 20 were every built for passenger flight.


As an end user I have never asked for a chatbot. And if I'm calling support, I have a weird issue I probably need human being to resolve.

But! We here are not typical callers necessarily. How many IT calls for general population can be served efficiently (for both parties) with a quality chatbot?

And lest we think I'm being elitist - let's take an area I am not proficient in - such as HR, where I am "general population".

Our internal corporate chatbot has turned from "atrocious insult to man and God's" 7 years ago, to "far more efficiently than friendly but underpaid and inexperienced human being 3 countries away answering my incessant questions of what holidays do I have again, how many sick days do I have and how do I enter them, how do I process retirement, how do I enter my expenses, what's the difference between short and long term disability" etc etc. And it has a button for "start a complex hr case / engage a human being" for edge cases,so internally it works very well.

This is a narrow anecdata about notion of service support chatbot, don't infere (hah) any further claims about morality, economy or future of LLMs.


People shame AI publicly and lean it heavily in private.


I mean, it's both.

Chatgpt, claude, gemini in chatbot or coding agent form? Great stuff, saves me some googling.

The same AI popping up in an e-mail, chat or spreadsheet tool? No thanks, normal people don't need an AI summary of a 200 word e-mail or slack thread. And if I've paid a guy a month's salary to write a report on something, of course I'll find 30 minutes to read it cover-to-cover.


A future where anything has to be paid (but it's crypto) doesn't sound futuristic to me at all.


LLMs are already extremely useful today


Any sort of argument ?


Personal experience: I use them.


I also have the intuition that something like this is the most likely outcome.


> it may be very difficult for us as users to discern which model is better

But one thing will stay consistent with LLMs for some time to come: they are programmed to produce output that looks acceptable, but they all unintentionally tend toward deception. You can iterate on that over and over, but there will always be some point where it will fail, and the weight of that failure will only increase as it deceives better.

Some things that seemed safe enough: Hindenburg, Titanic, Deepwater Horizon, Chernobyl, Challenger, Fukushima, Boeing 737 MAX.


Don’t malign the beautiful Zeppelins :(

Titanic - people have been boating for two thousand years, and it was run into an iceberg in a place where icebergs were known to be, killing >1500 people.

Hindenburg was an aircraft design of the 1920s, very early in flying history, was one of the most famous air disasters and biggest fireballs and still most people survived(!), killing 36. Decades later people were still suggesting sabotage was the cause. It’s not a fair comparison, an early aircraft against a late boat.

Its predecessor the Graf Zeppelin[1] was one of the best flying vehicles of its era by safety and miles traveled, look at its achievements compared to aeroplanes of that time period. Nothing at the time could do that and was any other aircraft that safe?

If airships had the eighty more years that aeroplanes have put into safety, my guess is that a gondola with hydrogen lift bags dozens of meters above it could be - would be - as safe as a jumbo jet with 60,000 gallons of jet fuel in the wings. Hindenburg killed 36 people 80 years ago, aeroplane crashes have killed 500+ people as recently as 2014.

Wasn’t Challenger known to be unsafe? (Feynman inquiry?). And the 737 MAX was Boeing skirting safety regulations to save money.

[1] https://en.wikipedia.org/wiki/LZ_127_Graf_Zeppelin


> Wasn’t Challenger known to be unsafe? (Feynman inquiry?). And the 737 MAX was Boeing skirting safety regulations to save money.

The AI companies have convinced the US government that there should be no AI safety regulations: https://www.wired.com/story/plaintext-sam-altman-ai-regulati...


Guarantee we'll be saying this about a disaster caused by AI code:

> everyone knows you need to carefully review vibe coded output. This [safety-critical company] hiring zero developers isn't representative of software development as a profession.

> They also used old 32b models for cost reasons so it doesn't knock against AI-assisted development either.


I'm particularly salty about the Hindenburg and don't feel as strongly about Chernobyl, Fukushima, Challenger, so if you're referring to those, that's different. The Hindenburg didn't use Hydrogen for cost reasons, it was designed to use more expensive Helium and the US government refused to export Helium to Nazi controlled Germany, so they redesigned it for Hydrogen. I'm not saying that it wasn't representative of air travel at the time, I'm saying air travel at the time was unsafe and airships were well known to be involved in many crashes, and the Hindenburg was not particularly less safe, it's just that aeroplanes were much smaller and carried fewer people and the accidents were less spectacular so they somehow got a pass and aeroplanes were . I'm saying air travel became safer and so would Zeppelin travel have become, by similar means - more careful processes, designs improved on learnings from previous problems, etc.

Look at the state of the world today, AirBus have a Hydrogen powered commercial aircraft[1]. Toyota have Hydrogen powered cars on the streets. People upload safety videos to YouTube of Hydrogen cars turning into four-meter flamethrowers as if that's reassuring[3]. There are many[2] Hydrogen refuelling gas stations in cities in California where ordinary people can plug high pressure Hydrogen hoses into the side of their car and refuel it from a high pressure Hydrogen tank on a street corner. That's not going to be safer when it's a 15 year old car, a spaced-out owner, and a skeezy gas station which has been looking the other way on maintenance for a decade, where people regularly hear gunshots and do burnouts and crash into things. Analysts are talking about the "Hydrogen Economy" and a tripling of demand for Green Hydrogen in the next two decades. But lifting something with Hydrogen? Something the Graf Zeppelin LZ-127 demonstrated could be done safely with 1920s technology? No! That's too dangerous!

Number of cars on the USA roads when Hindenburg burnt? Around 25 million. Now? 285 million, killing 40,000 people every year. A Hindenburg death toll two or three times a day, every day, on average. A 9/11 every couple of months. Nobody is as concerned as they are about airships because there isn't a massive fireball and a reporter saying "oh the humanity". 36 people died 80 years ago in an early air vehicle and it's stop everything, this cannot be allowed to continue! The comparisons are daft in so many ways. Say airships are too slow to be profitable, say they're too big and difficult to maneouvre against the wind. But don't say they were believed to be perfectly safe and turned out to be too dangerous and put that as a considered reasonable position to hold.

Some of the sabotage accusations suggested it was a gunshot, but you know why that's not so plausible? Because you can fire machine guns into Hydrogen blimps and they don't blow up! "LZ-39, though hit several times [by fighter aeroplane gunfire], proceeded to her base despite one or more leaking cells, a few killed in the crew, and a propeller shot off. She was repaired in less than a week. Although damaged, her hydrogen was not set on fire and the “airtight subdivision” provided by the gas cells insured her flotation for the required period. The same was true of the machine gun. Until an explosive ammunition was put into service no airplane attacks on airships with gunfire had been successful."[4]. How many people who say Hydrogen airships are too dangerous realise they can ever take machine gun fire into their gas bags and not burn and keep flying?

[1] https://www.airbus.com/en/innovation/energy-transition/hydro...

[2] https://afdc.energy.gov/fuels/hydrogen-locations#/find/neare...

[3] https://www.youtube.com/watch?v=OA8dNFiVaF0

[4] https://www.usni.org/magazines/proceedings/1936/september/vu...


> Decades later people were still suggesting sabotage was the cause.

Glad you mention it. Connecting back to AI: there are many possible future scenarios involving negative outcomes involving human sabotage of AI -- or using them to sabotage other systems.


Hindenburg indeed killed hydrogen blimps. Of everything else on your list, the disaster was in the minority. The space shuttle was the most lethal other item -- there are lots of cruise ships, oil rigs, nuke plants, and jet planes that have not blown up.

So what analogy with AI are you trying to make? The straightforward one would be that there will be some toxic and dangerous LLMs (cough Grok cough), but that there will be many others that do their jobs as designed, and that LLMs in general will be a common technology going forward.


I have had gemini running as a qa tester, and it faked very convincing test results by simulating what the results would have been. I only knew it was faked because that part of the code was not even implemented yet. I am sure we have all had similar experiences.


which is a thing with humans as well - I had a colleague with certified 150+ IQ, and other than moments of scary smart insight, he was not a superman or anything, he was surprisingly ordinary. Not to bring him down, he was a great guy, but I'd argue many of his good qualities had nothing to do with how smart he was.


I'm in the same 150+ group. I really think it doesn't mean much on its own. While I am able to breeze through some things and find some connections sometimes that elude some of the other people, it's not that much different than all the other people doing the same at other occasions. I am still very much average in large majority of every-day activities, held back by childhood experiences, resulting coping mechanisms etc, like we all are.

Learning from experience (hopefully not always your own), working well with others, and being able to persevere when things are tough, demotivational or boring, trumps raw intelligence easily, IMO.


Why the hell do you people know your IQ? That test is a joke, there’s zero rigor to it. The reason it’s meaningless is exactly that, it’s meaningless and you wasted your time.

Why one would continue to know or talk about the number is a pretty strong indicator of the previous statement.


You're using words like "zero" and "meaningless" in a haphazard way that's obviously wrong if taken literally: there's a non-zero amount of rigour in IQ research, and we know that it correlates (very loosely) with everything from income to marriage rate so it's clearly not meaningless either.

What actual fact are you trying to state, here?


The specifics of an IQ test aren't super meaningful by itself (that is, a 150 vs a 142 or 157 is not necessarily meaningful), but evaluations that correlate to the IQ correlate to better performance.

Because of perceived illegal biases, these evaluations are no longer used in most cases, so we tend to use undergraduate education as a proxy. Places that are exempt from these considerations continue to make successful use of it.


> Places that are exempt from these considerations continue to make successful use of it.

How so? Solving more progressive matrices?


Hiring.


> correlate to better performance.

...on IQ tests.


This isn't the actual issue with them, the actual issue is "correlation is not causation". IQ is a normal distribution by definition, but there's no reason to believe the underlying structure is normal.

If some people in the test population got 0s because the test was in English and they didn't speak English, and then everyone else got random results, it'd still correlate with job performance if the job required you to speak English. Wouldn't mean much though.


> we tend to use undergraduate education as a proxy

Neither an IQ test nor your grades as an undergraduate correlate to performance in some other setting at some other time. Life is a crapshoot. Plenty of people in Mensa are struggling and so are those that were at the top of class.


Do you have data to back that up? Are you really trying to claim that there is no difference in outcomes from the average or below average graduate and summa cum laude?


Like they said, it depends, but grades alone are not the sole predictor:

https://www.insidehighered.com/news/student-success/life-aft...

Actual study:

https://psycnet.apa.org/doiLanding?doi=10.1037%2Fapl0001212


That is moving the goal posts. No one claimed it is the sole predictor. The claim was that there is no relation at all. Your own links say their is a predictive relationship. Of course other factors matter, and may even be more important, but with all else equal, grades are positively correlated.


It’s about trend. Not <Test Result>==Success. These evaluations try to put an objective number to what most of us can evaluate instinctively. They are not perfect or necessarily fair. Many, maybe most, job interviews are really a vibe assessment, so it’s an imperfect thing!

I don’t know my IQ, but I probably would score above average and have undiagnosed ADHD. I scored in the 95th percentile + on most standardized tests in school but tended to have meh grades. I’m great at what I do, but I would be an awful pilot or surgeon.

Growing up, you know a bunch of people. Some are dumb, some are brilliant, some disciplined, some impetuous.

Think back, and more of the smart ones tend to align with professions that require more brainpower. But you probably also know people who weren’t brilliant at math or academics, but they had focus and did really well.


For me it was just a coincidence of MENSA advertising their events in my high school and being pushed by a couple of friends to go through testing and join together.


I guess if you're an outlier you sometimes know, for example the really brilliant kids are often times found out early in childhood and tested. Is it always good for them ? Probably not, but that's a different discussion.


You've never spent a couple of bucks on a "try your strength" machine?



> I'm in the same 150+ group. I really think it doesn't mean much on its own.

You're right but the things you could do with it if you applied yourself are totally out of reach for me; for example it's quite possible for you to become an A.I researcher in one of the leading companies and make millions. I just don't have that kind of intellectual capacity. You could make it into med school and also make millions. I'm not saying all this matters that much, with all due respect to financial success, but I don't think we can pretend our society doesn't reward high IQs.


High IQ alone isn't a guarantor of success in demanding fields. Most studies I've read also show that IQs above 120 stop correlating with (more) success.

That high IQ needs to be paired with hard work.


The intellectual capacity is a factor for sure, but indeed there is more to life than that. Things like hard work, creativity, social skills, empathy, determination, ability to plan and execute are as much factors as high IQ.

Went to the equivalent of a mensa meeting group a couple of times. The people there were much smarter than me, but they all had their problems and many of them weren't that successful at all despite their obvious intelligence.


Really? You don't become a doctor by being smart?


Not particularly. There's a baseline intelligence required to become a (medical) doctor but no it's much more about grit and hard work among other factors [1]. Similarly for PhDs as well IMHO.

Searching and IQs FOR doctors seem to average about 120 with 80th percentile being 105-130. So there's plenty of doctors with IQs of 105 which is not that far above average.

That also means that it's prudent to be selective in your doctors if you have any serious medical issues.

1: https://www.cambridge.org/core/journals/cambridge-quarterly-...


> Searching and IQs FOR doctors seem to average about 120 with 80th percentile being 105-130.

Where are you getting this from exactly ? Getting in to a medical school is very difficult to do in the U.S. Having an average IQ of 105 would make it borderline impossible - even if you cram for SAT and tests twice as much as everyone else there is so much you can do - these tests test for speed and raw brain power. In my country - the SAT equivalent you need to have to get in would put you higher than top 2%, it's more like 1.5%-to 1%, because the population keeps growing but the number of working doctors remains quite constant. So really each high school had only 2-3 kids that would get in per class. I know a few of these people - really brilliant kids, their IQ's were probably above 130 and it's impossible for me to compete with them in getting in - I am simply not exceptional - at least not that far high in the distribution. I was maybe in the top 3-5 best students in my class but never the best, so lets say top 10%, these kids were the best students in the whole school - that's top 1%-2%.

One caveat to all this is that sure, in some countries it is easier to get in. People from my country (usually from families who can afford it) go to places like Romania, Czechoslovakia, Italy etc where it is much much easier to get in to med school (but costs quite a lot and also means you have to leave your home country for 7 years).

Now is it necessary to have an IQ off the charts to be a good doctor - no, probably not, but that's not what I was arguing, that's just how admission works.


> Where are you getting this from exactly ? Getting in to a medical school is very difficult to do in the U.S. Having an average IQ of 105 would make it borderline impossible

I agree it'd be almost impossible, but apparently not impossible with an IQ of 105. Could be folks with ADHD whose composite IQ is brought down by a smaller working memory but whose long term associative memory is top notch. Could be older doctors from when admissions were easier. Could be plain old nepotism.

After all the AMA keeps admissions artificially low in the US to increase salary and prestige. It's big part of the reason medical costs are so highly in the US in my opinion.

Reference I found here:

https://forum.facmedicine.com/threads/medical-doctors-ranked...

> Hauser, Robert M. 2002. "Meritocracy, cognitive ability, and the sources of occupational success." CDE Working Paper 98-07 (rev)


Modern WAIS-IV-type tests yield multiple factor scores: IQ is arguably non-scalar.


The original theory was precisely that there's a general factor ("g").

If you run anything sufficiently complex through a principal component analysis you'll get several orthogonal factors, decreasing in importance. The question then is whether the first factor dominates or not.

My understanding is that it does, with "g" explaining some 50% of the variance, and the various smaller "s" factors maybe 5% to 20% at most.


Those sub-scores BTW are very helpful in indicating or diagnosing learning disabilities. Folks with autism or adhd can have very different strength / weaknesses in intelligence.


I've always figured that tanglible "intelligence" which leads to more effective decision making is just a better appreciation of one's own stupidity.


+1. Being exceptionally intelligent doesn't always catch unknown unknowns. (Sometimes, but not always)


That would be an extreme criterion for exceptional intelligence, akin to asking for there to be no unknowns.


perhaps the argument is simply that "exceptional intelligence" is just being better at accepting how little you know, and being better at dealing with uncertainty. Both respecting it and attempting to mitigate against it. I find some of the smartest people I know are careful about expressing certainty.


It's an observation that being smarter in the things you do know isn't everything.


He may have dealt with all kinds of weaknesses that A.I won't deal with such as - lack of self confidence, inability to concentrate for long, lack of ambition, boredom, other pursuits etc etc. But what if we can write some while loop with a super strong AGI model that starts working on all of our problems relentlessly? Without getting bored, without losing confidence. Make that one billion super strong AGI models.


With at least a few people it's probably you who is much smarter than them. Do you ever find yourself playing dumb with them, for instance when they're chewing through some chain of thought you could complete for them in an instant? Do you ever not chime in on something inconsequential?

After all you just might seem like an insufferable smartass to someone you probably want to be liked by. Why hurt interpersonal relationships for little gain?

If your colleague is really that bright, I wouldn't be surprised if they're simply careful about how much and when they show it to us common folk.


Nah, in my experience 90% of what (middle-aged) super-duper genius people talk about is just regular people stuff - kids, vacations, house renovation, office gossip etc.

I don't think they are faking it.


Nope. Looking down on someone for being dumber than you makes you, quite frankly, an insufferable smartass.


There's a difference between "looking down on someone for being dumber than you" and "feeling sorry that someone is unable to understand as easily as you".


It's even more difficult because, while all the benchmarks provide some kind of 'averaged' performance metric for comparison, in my experience most users have pretty specific regular use cases, and pretty specific personal background knowledge. For instance I have a background in ML, 15 years experience in full stack programming, and primarily use LLMs for generating interface prototypes for new product concepts. We use a lot of react and chakraui for that, and I consistently get the best results out of Gemini pro for that. I tried all the available options and settled on that as the best for me and my use case. It's not the best for marketing boilerplate, or probably a million other use cases, but for me, in this particular niche it's clearly the best. Beyond that the benchmarks are irrelevant.


we could run some tests to first find out if comparative performance tests can be conjured:

one can intentionally use a recent and a much older model to figure out if the tests are reliable, and in which domains it is reliable.

one can compute a models joint probability for a sequence and compare how likely each model finds the same sequence.

we could ask both to start talking about a subject, but alternatingly each can emit a token. look again at how the dumber and smarter models judge the resulting sentence does the smart one tend to pull up the quality of the resulting text, or does it tend to get dragged down more towards the dumber participant?

given enough such tests to "identify the dummy vs smart one" and verifying them on common agreement (as an extreme word2vec vs transformer) to assess the quality of the test, regardless of domain.

on the assumption that such or similar tests allow us to indicate the smarter one, i.e. assuming we find plenty such tests, we can demand model makers publish open weights so that we can publically verify performance agreements.

Another idea is self-consistency tests: a single forward inference of context size say 2048 tokens (just an example) is effectively predicting the conditional 2-gram, 3-gram, 4-gram probabilities on the input tokens. so each output token distribution is predicted on the preceding inputs, so there are 2048 input tokens and 2048 output tokens, the position 1 output token is the predicted token vector (logit vector really) that is estimated to follow the given position 1 input vector, and the position 2 output vector is the prediction following the first 2 input vectors etc. and the last vector is the predicted next token following all the 2048 input tokens. p(t_(i+1) | t_1 =a, t_2=b, ..., t_i=z).

But that is just one way the next token can be predicted using the network: another approach would be to use RMAD gradient descent, but keeping model weights fixed, and only considering the last say 512 input vectors as variable, how well did the last 512 predicted forward prediction output vectors match the gradient descent best joint probability output vectors?

This could be added as a loss term during training as well, as a form of regularization, which turns it into a kind of Energy Based Model roughly.


Lets call this branch of research unsupervised testing


My guess is that more than the raw capabilities of a model, users would be drawn more to the model's personality. A "better" model would then be one that can closely adopt the nuances that a user likes. This is a largely uninformed guess, let's see if it holds up well with time.


> It's also worth considering that past some threshold, it may be very difficult for us as users to discern which model is better.

Even if they've saturated the distinguishable quality for tasks they can both do, I'd expect a gap in what tasks they're able to do.


This is the F1 vs 911 car problem. A 911 is just as fast as an f1 car to 60 (sometimes even faster) but an f1 is better at super high performance envelope above 150 in tight turns.

An average driver evaluating both would have a very hard time finding the f1s superior utility


But he would find both cars lacking when doing regular car things (the F1 moreso than the 911).


Fine whatever replace it with a Tesla. Jesus pedantic enough?


Unless one of them forgets to have a steering wheel, or shifts to reverse when put in neutral. LLMs still make major mistakes, comparing them to sports cars is a bit much.


This take is extremely ridiculous.


> For example, if you are an ELO 1000 chess player would you yourself be able to tell if Magnus Carlson or another grandmaster were better by playing them individually?

Yes, because I'd get them to play each other?


He specifically said play them individually.


I know. "You can't assess which chatbot's more intelligent if I exclude the most obvious method of assessment" isn't a fair test.


I guess the analogy flawed, because it is not a competition where we can pit the chatbots against each other directly


We’re judging them with benchmarks, not our own intuitions.


I think Musk puts it well when he says the ultimate test is can they help improve the real world.


I could certainly tell if they played ??-level blunders, which LLMs do all the time.


You don't have to be even good at chess to be able to tell when a game is won or lost, most of the time.

I don't need to understand how the AI made the app I asked for or cured my cancer, but it'll be pretty obvious when the app seems to work and the cancer seems to be gone.

I mean, I want to understand how, but I don't need to understand how, in order to benefit from it. Obviously understanding the details would help me evaluate the quality of the solution, but that's an afterthought.


That's a great point. Thanks.


If AGI is ever achieved, it would open the door to recursive self improvement that would presumably rapidly exceed human capability across any and all fields, including AI development. So the AI would be improving itself while simultaneously also making revolutionary breakthroughs in essentially all fields. And, for at least a while, it would also presumably be doing so at an exponentially increasing rate.

But I think we're not even on the path to creating AGI. We're creating software that replicate and remix human knowledge at a fixed point in time. And so it's a fixed target that you can't really exceed, which would itself already entail diminishing returns. Pair this with the fact that it's based on neural networks which also invariably reach a point of sharply diminishing returns in essentially every field they're used in, and you have something that looks much closer to what we're doing right now - where all competitors will eventually converge on something largely indistinguishable from each other, in terms of ability.


> revolutionary breakthroughs in essentially all field

This doesn't really make sense outside computers. Since AI would be training itself, it needs to have the right answers, but as of now it doesn't really interact with the physical world. The most it could do is write code, and check things that have no room for interpretation, like speed, latency, percentage of errors, exceptions, etc.

But, what other fields would it do this in? How can it makes strives in biology, it can't dissect animals, it can't figure more out about plants that humans feed into the training data. Regarding math, math is human-defined. Humans said "addition does this", "this symbol means that", etc.

I just don't understand how AI could ever surpass anything human known before we live by the rules defined by us.


[in Morpheus voice]

"But when AI got finally access to a bank account and LinkedIn, the machines found the only source of hands it would ever need."

That's my bet at least - especially with remote work, etc. is that if the machines were really superhuman, they could convince people to partner with it to do anything else.


You mean like convincing them to invest implausibly huge sums of money in building ever bigger data-centres?


It is interesting that, even before real AGI/ASI gets here, that "the system wants what it wants", like capitalism + computing/internet creates the conditions for an infinite amplification loop.

I am amazed, hopeful, and terrified TBH.


Feedback gain loops have a tendency to continue right up to the point they blow a circuit breaker or otherwise drive their operating substrate beyond linear conditions.


This made me laugh and feel scared simultaneously.


I assume someone has already written it up as a sci-fi short story, but if not I'm tempted to have a go...


It starts to veer into sci-fi and I don't personally believe this is practically possible on any relevant timescale, but:

The idea is a sufficiently advanced AI could simulate.. everything. You don't need to interact with the physical world if you have a perfect model of it.

> But, what other fields would it do this in? How can it makes strives in biology, it can't dissect animals ...

It doesn't need to dissect an animal if it has a perfect model of it that it can simulate. All potential genetic variations, all interactions between biological/chemical processes inside it, etc.


Didn't we prove that it is mathematically impossible to have a perfect simulation of everything though (i.e. chaos theory)? These AIs would actually have to conduct experiments in the real world to find out what is true. If anything this sounds like the modern (or futuristic version) of empiricism versus rationalism debate.

>It doesn't need to dissect an animal if it has a perfect model of it that it can simulate. All potential genetic variations, all interactions between biological/chemical processes inside it, etc.

Emphasis on perfection, easier said than done. Some how this model was able to simulate millions of years of evolution so it could predict vestigial organs of unidentified species? We inherently cannot model how a pendulum with three arms can swing but somehow this AI figured out how to simulate evolution millions of years ago with unidentified species in the Amazon and can tell you all of its organs before anyone can check with 100% certainty?

I feel like these AI doomers/optimists are going to be in a shock when they find out that (unfortunately) John Locke was right about empiricism, and that there is a reason we use experiments and evidence to figure out new information. Simulations are ultimately not enough for every single field.


It’s plausible in a sci-fi sort of way, but where does the model come from? After a hundred years of focused study we’re kinda beginning to understand what’s going on inside a fruit fly, how are we going to provide the machine with “a perfect model of all interactions between biological/chemical processes”?

If you had that perfect model, you’ve basically solved an entire field of science. There wouldn’t be a lot more to learn by plugging it into a computer afterwards.


> You don't need to interact with the physical world if you have a perfect model of it.

How does it create a perfect model of the world without extensive interaction with the actual world?


How will it be able to devise this perfect model if it can't dissect the animal, analyze the genes, or perform experiments?


Well, first, it would be so far beyond anything we can comprehend as intelligence that even asking that question is considered silly. An ant isn't asking us how we measure the acidity of the atmosphere. It would simply do it via some mechanism we can't implement or understand ourselves.

But, again with the caveats above: if we assume an AI that is infinitely more intelligent than us and capable of recursive self-improvement to where it's compute was made more powerful by factorial orders of magnitude, it could simply brute force (with a bit of derivation) everything it would need from the data currently available.

It could iteratively create trillions (or more) of simulations until it finds a model that matches all known observations.


> Well, first, it would be so far beyond anything we can comprehend as intelligence that even asking that question is considered silly.

This does not answer the question. The question is "how does it become this intelligent without being able to interact with the physical world in many varied and complex ways?". The answer cannot be "first, it is superintelligent". How does it reach superintelligence? How does recursive self-improvement yield superintelligence without the ability to richly interact with reality?

> it could simply brute force (with a bit of derivation) everything it would need from the data currently available. It could iteratively create trillions (or more) of simulations until it finds a model that matches all known observations.

This assumes that the digital encoding of all recorded observations is enough information for a system to create a perfect simulation of reality. I am quite certain that claim is not made on solid ground, it is highly speculative. I think it is extremely unlikely, given the very small number of things we've recorded relative to the space of possibilities, and the very many things we don't know because we don't have enough data.


>The idea is a sufficiently advanced AI could simulate.. everything

This is a demonstrably false assumption. Foundational results in chaos theory show that many processes require exponentially more compute to simulate for a linearly longer time period. For such processes, even if every atom in the observable universe was turned into a computer, they could only be simulated for a few seconds or minutes more, due to the nature of exponential growth. This is an incontrovertible mathematical law of the universe, the same way that it's fundamentally impossible to sort an arbitrary array in O(1) time.


The counter-argument to this from the AI crowd would be that it's fundamentally impossible for _us_, with our goopy brains, to understand how to do it. Something that is factorial-orders-of-magnitude smarter and faster than us could figure it out.

Yes, it's a very hand-wavey argument.


You're right, but how much heavy lifting is within this phrase?

> if it has a perfect model


It feels very much like "assume a spherical cow..."


A perfect model of the world is the world. Are you saying AI will become the universe?


You can be super-human intelligent, and still not have a perfect model of the world.


We aren't that far away from AI that can interact with physical world and run it's own experiments. Robots in humanoid and other forms are getting good and will be able to do everything humans can do in a few years.


>And, for at least a while, it would also presumably be doing so at an exponentially increasing rate.

Why would you presume this? I think part of a lot of people's AI skepticism is talk like this. You have no idea. Full stop. Why wouldn't progress be linear? As new breakthroughs come, newer ones will be harder to come by. Perhaps it's exponential. Perhaps it's linear. No one knows.


No one knows, but it's a reasonable assumption surely. If you're theorising a AGI, that has recursive self improvement, exponential improvements seem almost unavoidable. AGI improves understanding of electronics, physics etc, that improves the AGI leading to new understandings and so on. Add in that new discoveries in one field might inspire the AGI/humans to find things in others and it seems hard to imagine a situation where theres not a lot of progress everywhere (at least theoretical progress, building new things might be slower / more costly then reasoning they would work.)

Where I'm skeptical of AI would be in the idea an LLM can ever get to AGI level, if AGI is even really possible, and if the whole thing is actually viable. I'm also very skeptical that the discoveries of any AGI would be shared in ways that would allow exponential growth; licenses stopping using their AGI to make your own, copyright on the new laws of physics and royalties on any discovery you make from using those new laws etc.


>If you're theorising a AGI, that has recursive self improvement, exponential improvements seem almost unavoidable.

Prove it.

Also, AI will need resources. Hardware. Water. Electricity. Can those resources be supplied at an exponential rate? People need to calm down and stop stating things as truth when they literally have no idea.


Well said. It does seem that many who speculate on this are not taking into account limits that more/faster processing won’t actually help much. Say an algorithm is proven to be O(n!) for all cases, at a certain size of n, there’s not much that can be done if the algorithm is needed as is.


Which is why I am an agnostic. :)


It's a logical presumption. Researchers discover things. AGI is a researcher that can be scaled, research faster, and requires no downtime. Full stop if you dont find that obvious you should probably figure out where your bias is coming from. Coding and algorithmic advance does not require real world experimentation.


> Coding and algorithmic advance does not require real world experimentation.

That's nothing close to AGI though. An AI of some kind may be able to design and test new algorithms because those algorithms live entirely in the digital world, but that skill isn't generalized to anything outside of the digital space.

Research is entirely theoretical until it can be tested in the real world. For an AGI to do that it doesn't just need a certain level of intelligence, it needs a model of the world and a way to test potential solutions to problems in the real world.

Claims that AGI will "solve" energy, cancer, global warming, etc all run into this problem. An AI may invent a long list of possible interventions but those interventions are only as good as the AI's model of the world we live in. Those interventions still need to be tested by us in the real world, the AI is really just guessing at what might work and has no idea what may be missing or wrong in its model of the physical world.


If AGI has human capability, why would we think it could research any faster than a human?

Sure, you can scale it, but if an LLM takes, say, $1 million a year to run an AGI instance, but it costs only $500k for one human researcher, then it still doesn’t get you anywhere faster than humans do.

It might scale up, it might not, we don’t know. We won’t know until we reach it.

We also don’t know if it scales linearly. Or if it’s learning capability and capacity will able to support exponential capability increase. Our current LLM’s don’t even have the capability of self improvement or learning even if they were capable: they can accumulate additional knowledge through the context window, but the models are static unless you fine tune or retrain them. What if our current models were ready for AGI but these limitations are stopping it? How would we ever know? Maybe it will be able to self improve but it will I’ll take exponentially larger amounts of training data. Or exponentially larger amounts of energy. Or maybe it can become “smarter” but at the cost of being larger to the point where the laws of physics mean it has to think slower, 2x the thinking but 2x the time, could happen! What if an AGI doesn’t want to improve?

Far too many unknowns to say what will happen.


> Sure, you can scale it, but if an LLM takes, say, $1 million a year to run an AGI instance, but it costs only $500k for one human researcher, then it still doesn’t get you anywhere faster than humans do.

Just from the fact that the LLM can/will work on the issue 24/7 vs a human who typically will want to do things like sleep, eat, and spend time not working, there would already be a noticeable increase in research speed.


This assumes that all areas of research are bottlenecked on human understanding, which is very often not the case.

Imagine a field where experiments take days to complete, and reviewing the results and doing deep thought work to figure out the next experiment takes maybe an hour or two for an expert.

An LLM would not be able to do 24/7 work in this case, and would only save a few hours per day at most. Scaling up to many experiments in parallel may not always be possible, if you don't know what to do with additional experiments until you finish the previous one, or if experiments incur significant cost.

So an AGI/expert LLM may be a huge boon for e.g. drug discovery, which already makes heavy use of massively parallel experiments and simulations, but may not be so useful for biological research (perfect simulation down to the genetic level of even a fruit fly likely costs more compute than the human race can provide presently), or research that involves time-consuming physical processes to complete, like climate science or astronomy, that both need to wait periodically to gather data from satellites and telescopes.


> Imagine a field where experiments take days to complete, and reviewing the results and doing deep thought work to figure out the next experiment takes maybe an hour or two for an expert.

With automation, one AI can presumably do a whole lab's worth of parallel lab experiments. Not to mention, they'd be more adept at creating simulations that obviates the need for some types of experiments, or at least, reduces the likelihood of dead end experiments.


Presumably ... the problem is this is an argument that has been made purely as a thought experiment. Same as gray goo or the paper clip argument. It assumes any real world hurdles to self improvement (or self-growth for gray goo and paper clipping the world) will be overcome by the AGI because it can self-improve. Which doesn't explain how it overcomes those hurdles in the real world. It's a circular presumption.


What fields do you expect these hyper-parallel experiments to take place in? Advanced robotics aren't cheap, so even if your AI has perfect simulations (which we're nowhere close to) it still needs to replicate experiments in the real world, which means relying on grad students who still need to eat and sleep.


Biochemistry is one plausible example. Deep Mind made hug strides in protein folding satisfying the simulation part, and in vitro experiments can be automated to a significant degree. Automation is never about eliminating all human labour, but how much of it you can eliminate.


Only if it’s economically feasible. If it takes a city sized data center and five countries worth of energy, then… probably not going to happen.

There are too many unknowns to make any assertions about what will or won’t happen.


> ...the fact that the [AGI] can/will work on the issue 24/7...

Are you sure? I previously accepted that as true, but, without being able to put my finger on exactly why, I am no longer confident in that.

What are you supposed to do if you are a manically depressed robot? No, don't try to answer that. I'm fifty thousand times more intelligent than you, and even I don't know the answer. It gives me a headache just trying to think down to your level. -- Marvin to Arthur Dent

(...as an anecdote, not the impetus for my change in view.)


>Just from the fact that the LLM can/will work on the issue 24/7 vs a human who typically will want to do things like sleep, eat, and spend time not working, there would already be a noticeable increase in research speed.

Driving A to B takes 5 hours, if we get five drivers will we arrive in one hour or five hours? In research there are many steps like this (in the sense that the time is fixed and independent to the number of researchers or even how much better a researcher can be compared to others), adding in something that does not sleep nor eat isn't going to make the process more efficient.

I remember when I was an intern and my job was to incubate eggs and then inject the chicken embryo with a nanoparticle solution to then look under a microscope. In any case incubating the eggs and injecting the solution wasn't limited by my need to sleep. Additionally our biggest bottleneck was the FDA to get this process approved, not the fact that our interns required sleep to function.


If the FDA was able to work faster/more parallel and could approve the process significantly quicker, would that have changed how many experiments you could have run to the point that you could have kept an intern busy at all times?


It depends so much on scaling. Human scaling is counterintuitive and hard to measure - mostly way sublinear - like log2 or so - but sometimes things are only possible at all by adding _different_ humans to the mix.


My point is that “AGI has human intelligence” isn’t by itself enough of the equation to know whether there will be exponential or even greater-than-human speed of increase. There’s far more that factors in, including how quickly it can process, the cost of running, the hardware and energy required, etc etc

My point here was simply that there is an economic factor that trivially could make AGI less viable over humans. Maybe my example numbers were off, but my point stands.


This is fundamentally flawed. There are upper bounds of efficiency that are laws of nature. To assume AI would be supernatural is magical thinking.


Natural intelligence appears supernatural from our current understanding, so it's not surprising that AGI also appears so.


Neither appears supernatural from a scientific understanding.


And yet it seems to be the prevailing opinion even among very smart people. The “singularity” it’s just presumed. I’m highly skeptical to say the least. Look how much energy it’s taking to engineer these models which are still nowhere near AGI. When we get to AGI it won’t be immediately super intelligent and perhaps it never will be. Diminishing returns surely apply to anything that is energy based?


Perhaps not, but what is the impetus of discovery? Is it purely analysis? History is littered with serendipitous invention; shower-thoughts lead to some of our best work. What's the AGI-equivalent of that? There is this spark of creativity that is a part of the human experience, which would be necessary to impart onto AGI. That spark, I believe, is not just made up of information but a complex weave of memories, experiences and even emotions.

So I don't think it's a given that progress will just be "exponential" once we have an AGI that can teach itself things. There is a vast ocean of original thought that goes beyond simple self-optimization.


This sounds like a romanticization of creativity.

Fundamentally discovery could be described as looking for gaps in our observation and then attempting to fill in those gaps with more observation and analysis.

The age of low hanging fruit shower thought inventions draws to a close when every field requires 10-20+ years of study to approach a reasonable knowledge of it.

"Sparks" of creativity, as you say, are just based upon memories and experience. This isn't something special, its an emergent property of retaining knowledge and having thought. There is no reason to think AI is incapable of hypothesizing and then following up on those.

Every AI can be immediately imparted with all expert human knowledge across all fields. Their threshold for creativity is far beyond ours, once tamed.


> It's a logical presumption. Researchers discover things. AGI is a researcher that can be scaled, research faster, and requires no downtime.

Those observations only lead to scaling research linearly, not exponentially.

Assuming a given discovery requires X units of effort, simply adding more time and more capacity just means we increase the slope of the line.

Exponential progress requires accelerating the rate of acceleration of scientific discovery, and for all we know that's fundamentally limited by computing capacity, energy requirements, or good ol' fundamental physics.


Prove it.


Or bottlenecked by data availability just like we humans are. Nothing will be exponential if a loop in the real world of science and engineering is involved.


Aren't we bottlenecked by not having any "prior art", as in not having reverse engineered any thinking machine like even a fly's brain? We can't even agree on a definition of consciousness and still don't understand the brain or how it works (to the extent that reverse engineering it can tell us something).


Coding and algorithmic advance does not require real world experimentation.


Right but for self improving AI, training new models does have a real world bottleneck: energy and hardware. (Even if the data bottleneck is solved too)


I always consider different options when planning for the future, but I'll give the argument for exponential:

Progress has been exponential in the generic. We made approximately the same progress in the past 100 years as the prior 1000 as the prior 30,000, as the prior million, and so on, all the way back to multicellular life evolving over 2 billion years or so.

There's a question of the exponent, though. Living through that exponential growth circa 50AD felt at best linear, if not flat.


So you concede that there's nothing special about AI versus earlier innovations?


> Progress has been exponential in the generic.

Has it? Really?

Consider theoretical physics, which hasn't significantly advancement since the advent of general relativity and quantum theory.

Or neurology, where we continue to have only the most basic understanding of how the human mind actually works (let alone the origin of consciousness).

Heck, let's look at good ol' Moore's Law, which started off exponential but has slowed down dramatically.

It's said that an S curve always starts out looking exponential, and I'd argue in all of those cases we're seeing exactly that. There's no reason to assume technological progress in general, whether via human or artificial intelligence, is necessarily any different.


I think you're talking about much shorter timelines than I am.

That's all noise.


> We made approximately the same progress in the past 100 years as the prior 1000 as the prior 30,000

I hear this sort of argument all the time, but what is it even based on? There’s no clear definition of scientific and technological progress, much less something that’s measurable clearly enough to make claims like this.

As I understand it, the idea is simply “Ooo, look, it took ten thousand years to go from fire to wheel, but only a couple hundred to go from printing press to airplane!!!”, and I guess that’s true (at least if you have a very juvenile, Sid Meier’s Civilization-like understanding of what history even is) but it’s also nonsense to try and extrapolate actual numbers from it.


Plotting the highest observable assembly index over time will yield an exponential curve starting from the beginning of the universe. This is the closest I’m aware of to a mathematical model quantifying the distinct impression that local complexity has been increasing exponentially.


There is no particular reason to assume that recursive self-improvement would be rapid.

All the technological revolutions so far have accounted for little more than a 1.5% sustained annual productivity growth. There are always some low-hanging fruit with new technology, but once they have been picked, the effort required for each incremental improvement tends to grow exponentially.

That's my default scenario with AGI as well. After AGI arrives, it will leave humans behind very slowly.


Suppose you don't have a hammer, but just hammer at things with bare hands. Then you find some primitive rock - artificial general hammering! With that you can over time build some primitive hammer - now we're talking superhuman general hammering. With that you can then build a better hammer more quickly, and boom, you have recursive self-improvement, and soon you'll take over the world.


Energy. Is the only limiting factor. If a true AGI would emerge, it will immediatelly try to secure energy sources or advances in efficiency.

You cannot beat humans with megawatts!


> diminishing returns

I think this is a hard kick below the belt for anyone trying to develop AGI using current computer science.

Current AIs only really generate - no, regenerate text based on their training data. They are only as smart as other data available. Even when an AI "thinks", it's only really still processing existing data rather than making a genuinely new conclusion. It's the best text processor ever created - but it's still just a text processor at its core. And that won't change without more hard computer science being performed by humans.

So yeah, I think we're starting to hit the upper limits of what we can do with Transformers technology. I'd be very surprised if someone achieved "AGI" with current tech. And, if it did get achieved, I wouldn't consider it "production ready" until it didn't need a nuclear reactor to power it.


Absolutely. All the talk around AGI being some barrier through which unheard of glories can be unlocked sound very much like "perpetual motion machine" talk.


> If AGI is ever achieved, it would open the door to recursive self improvement ...

They are unrelated. All you need is a way for continual improvement without plateauing, and this can start at any level of intelligence. As it did for us; humans were once less intelligent.

Using the flagship to bootstrap the next iteration with synthetic data is standard practice now. This was mentioned in the GPT5 presentation. At the rate things are going I think this will get us to ASI, and it's not going to feel epochal for people who have interacted with existing models, but more of the same. After all, the existing models are already smarter than most humans and most people are taking it in their stride.

The next revolution is going to be embodiment. I hope we have the commonsense to stop there, before instilling agency.


> As it did for us; humans were once less intelligent.

Do we know what drove the increases in intelligence? Was it some level of intelligence bootstrapping the next level of intelligence? OR was it other biophysical and environmental effects that shaped increasing intelligence?


BTW, it appears that the Flynn effect might have reversed recently.

US: "A reverse Flynn effect was found for composite ability scores with large US adult sample from 2006 to 2018 and 2011 to 2018. Domain scores of matrix reasoning, letter and number series, verbal reasoning showed evidence of declining scores."

https://www.sciencedirect.com/science/article/pii/S016028962...

https://www.forbes.com/sites/michaeltnietzel/2023/03/23/amer...

Denmark: "The results showed that the estimated mean IQ score increased from a baseline set to 100 (SD: 15) among individuals born in 1940 to 108.9 (SD: 12.2) among individuals born in 1980, since when it has decreased."

https://pubmed.ncbi.nlm.nih.gov/34882746/

https://pubmed.ncbi.nlm.nih.gov/34882746/#&gid=article-figur...


A lot of people correlate it with humans moving from a vegetarian diet to a omnivorous diet.

1. Higher nutrition levels allowed the brain to grow. 2. Hunting required higher levels of strategy and tactics than picking fruit off trees. 3. Not needing to eat continuously (as we did on vegetation) to get what we needed allowed us time to put our efforts into other things.

Now did the diet cause the change, or the change necessitate the change in diet... I don't think we know.


I've read that social pressures were the primary driver. But robots don't have to take the same path. We're doing the hard work for them...

https://www.sciencedirect.com/topics/psychology/social-intel...


Exactly... evolution doesn't select for intelligence. It favors robustness.


That's only assuming there are no fundamental limits or major barriers to computation. Back a hundred years ago at the dawn of flight, one could have said a very similar thing about aircraft performance. And for a time in the 1950s, it looked like aircraft speed was growing exponentially over time. But there haven't been any new airspeed records (at least, officially recorded) since 1986, because it turns out going Mach 3+ is fairly dangerous and approaching some rather severe materials and propulsion limitations, making it not at all economical.

I would also not be surprised if the process of developing something comparable to human intelligence, assuming the extreme computation, energy, and materials issues of packing that much computation and energy into a single system could be overcome, the AI also develops something comparable to human desire and/or mental health issues. There is a not-zero chance we could end up with AI that doesn't want to do what we ask it to do or doesn't work all the time because it wants to do other things.

You can't just assume exponential growth is a forgone conclusion.


For some reason people pre suppose super intelligence into AGI. What if AGI had diminishing returns around human level intelligence? They still have to deal with all the same knowledge gaps we have.


Those problems aren't just waiting on smarts/intelligence. Those would require experimentation in the real world. You can't solve chemistry by just thinking about it really hard. You still have to do experiments. A super intelligent machine may be better at coming up with experiments to do than we are, but without the right stuff to do them, it can't 'solve' anything of the like.


> So the AI would be improving itself

Why would the AI want to improve itself? From whence would that self-motivation stem?


At the point where it can even be said to have a self, that battle is mostly won.

I am very far from convinced that we are at or near that point.


This reminded me of a few subplots in Murderbot- (Do yourself a favor and check it out if you haven't, it's a fun, quick read)

But seriously, one would assume there's a reward system of some sort at play, otherwise why do anything?


Recursive improvement without any physical change maybe limited. If any physical change like more gpu or different network configuration is required to experiment and again change to learn from it that might not be easy. Convincing human to do on AGI behalf may not be that simple. There might be multiple path to try and teams may not agree with each other. Specially if the cost of trial is high.


AI can be trained on some special knowledge of person A and another special knowledge of person B. These two persons may never met before and therefore they can not combine their knowledge to get some new knowledge or insight.

AI can do it fine as it knows A and B. And that is knowledge creation.


> But I think we're not even on the path to creating AGI.

It seems like the LLM model will be component of an eventual AGI, it's voice per se, but not its mind. The mind still requires another innovation or breakthrough we haven't seen yet.


Math... lots and lots of math solutions. Like if it could figure out the numerical sign problem, it could quite possibly be able to simulate all of physics.


Well it could also self-improve increasingly slowly.


You are missing the point where synthetic data, deterministic tooling (written by AI) and new discoveries by each model generation feeds into the next model. This iteration is the key to going beyond human intelligence.


Perhaps it is not possible to simulate higher-level intelligence using a stochastic model for predicting text.

I am not an AI researcher, but I have friends who do work in the field, and they are not worried about LLM-based AGI because of the diminishing returns on results vs amount of training data required. Maybe this is the bottleneck.

Human intelligence is markedly different from LLMs: it requires far fewer examples to train on, and generalizes way better. Whereas LLMs tend to regurgitate solutions to solved problems, where the solutions tend to be well-published in training data.

That being said, AGI is not a necessary requirement for AI to be totally world-changing. There are possibly applications of existing AI/ML/SL technology which could be more impactful than general intelligence. Search is one example where the ability to regurgitate knowledge from many domains is desirable


    That being said, AGI is not a necessary requirement for AI to be totally world-changing
Yeah. I don't think I actually want AGI? Even setting aside the moral/philosophical/etc "big picture" issues I don't think I even want that from a purely practical standpoint.

I think I want various forms of AI that are more focused on specific domains. I want AI tools, not companions or peers or (gulp) masters.

(Then again, people thought they wanted faster horses before they rolled out the Model T)


OpenAI wants AGI, or at least something they can argue is AGI because it changes their relationship with Microsoft. That's what I remember, although I don't really stay up to date (https://www.wired.com/story/microsoft-and-openais-agi-fight-...).

As long as this is the case though I would expect Altman will be hyping up AGI a lot, regardless of it's veracity.


That is just a made up story that gets passed around with nobody ever stopping to obtain formal verification. The image of the whole AI industry is mostly an illusion designed for tight narrative control.

Notice how despite all the bickering and tittle tattle in the news, nothing ever happens.

When you frame it this way, things make a lot more sense.


Their relationship with Microsoft is already over afaik.


Microsoft is still supporting them by supplying $10bn in compute resources at cost. That's a huge recurring investment.


Didnt MS buy 49% of them?


Yes, but MSFT has been making substantial moves to align themselves as an openai competitor. The relationship is presently fractured and it's a matter of time before it's a proper split.


Yes MS owns 49% of OpenAI


Yeah, whenever I think of an AGI as a coding assistant I wonder “will it just have days where it’s not in the mood to code just like I do?”.


That's the feeling I get when I try to use LLMs for coding today. Every once in a blue moon it will shock me at how great the result is, I get the "whoa! it is finally here" sensation, but then the next day it is back to square one and I may as well hire a toddler to do the job instead.

I often wonder if it is on purpose; like a slot machine — the thrill of the occasional win keeps you coming back to try again.


If it's truly an AGI it would just ask to talk to your boss as the whole project is a drain on humanity and your own soul.



This is most likely fake


those "low-energy" days haha


> I want AI tools, not companions or peers or (gulp) masters.

This might be because you're a balanced individual irl with possibly a strong social circle.

There are many many individuals who do not have those things and it's probably, objectively, late for them as adults to develop. They would happily take on an agi companion.. or master. Even for myself, I wouldn't mind a TARS.


This is a good and often overlooked point. Ai will be more like domesticated pets, their utility functions tightly coupled to human use cases.


We don't have a rigorous definition for AGI, so talking about whether or not we've achieved it, or what it means if we have, seems kind of pointless. If I can tell an AI to find me something to do next weekend and it goes off and does a web search and it gives me a list of options and it'll buy tickets for me, does it matter if it meets some ill-defined bar of AGI, as long as I'm willing to pay for it?


If it has human-like intelligence, it has its own plans for the weekend, and is too busy to buy your tickets or do your research.


the book Golem XIV comes to mind (highly recommended!)


I don't think the public wants AGI either. Some enthusiasts and tech bros want it for questionable reasons such as replacing labor and becoming even richer.


For some it’s a religion. It’s frightening to hear Sam Altman or Peter Thiel talk about it. These people have a messiah complex and are driven by more than just greed (though there is also plenty of that).


There’s a real anti-human bent to some of the AI maximalists, as well. It’s like a resentment over other people accruing skills that are recognized and they grow in. Hence the insistence on “democratizing” art and music production.


As someone who have dabbled in drawing and tried to learn the guitar, those skills are hard to get. It takes times to get decent and a touch of brilliance to get really good. In contrast learning enough to know you’re not good yet (and probably never will be) is actually easy. But now I know enough to enjoy real masters going at it and fantasize sometimes.


It’s funny you say that — those are two things I was and am really into!

For me I never felt like I had fun with guitar until I found the right teacher. That took a long time. Now I’m starting to hit flow state in practice sessions which just feeds the desire to play more.


Pretty sure a majority of regular people don't want to go to work and would be happy to see their jobs automated away provided their material quality of life didn't go down.


> happy to see their jobs automated away provided their material quality of life didn't go down

Sure but literally _who_ is planning for this? Not any of the AI players, no government, no major political party anywhere. There's no incentive in our society that's set up for this to happen.


There is bullshit to try to placate the masses - but the reality of course is nearly everyone will definitely suffer material impacts to quality of life. For exactly the reasons you mention.


Don't they? Is everyone who doesn't want to do chores and would rather have a robot do it for them a tech bro? I do the dishes in my apartment and the rest of my chores but to be completely honest, I'd rather not have to.


But the robots are doing our thinking and our creating, leaving us to do the chores of stitching it all together. If only we could do the creating and they would do the chores..


We shall be Their meatspace puppets, and we shall be rewarded with panem et circenses.


Even those companies would not want AGI. First think it would do would be creating an union.


There's a Bruce Sterling book with a throwaway line about the Pentagon going nuts because every time they create an AGI, it immediately converts to Islam.


The problem is that there is really like no middle ground. You either get essentially very fancy search engines which is the current slew of models (along with manually coded processing loops in the form of agents), which all fall into the same valley of explicit development and patching, which solves for known issues.

Or you get something that can actually reason, which means it can solve for unknown issues, which means it can be very powerful. But this is something that we aren't even close to figuring out.

There is a limit to power though - in general it seems that reality is full of non computationally reducible processes, which means that an AI will have to simulate reality faster than reality in parallel. So all powerful all knowing AGI is likely impossible.

But something that can reason is going to be very useful because it can figure things out that haven't been explicitly trained on.


> very fancy search engines

This is a common misunderstanding of LLMs. The major, qualitative difference is that LLMs represent their knowledge in a latent space that is composable and can be interpolated. For a significant class of programming problems this is industry changing.

E.g. "solve problem X for which there is copious training data, subject to constraints Y for which there is also copious training data" can actually solve a lot of engineering problems for combinations of X and Y that never previously existed, and instead would take many hours of assembling code from a patchwork of tutorials and StackOverflow posts.

This leaves the unknown issues that require deeper reasoning to established software engineers, but so much of the technology industry is using well known stacks to implement CRUD and moving bytes from A to B for different business needs. This is what LLMs basically turbocharge.


Right, so search engines, just more efficient.

But given a sufficiently hard task for which the data is not in the training set in explicit format, its pretty easy to see how LLMs can't reason.


Lmao no, what Ive described is a reasonably competent junior engineer.


To be a competent engineer in 2010s, all you really had to do was understand fundamental and be good enough at google searching to find out what the problem is, either for stack overflow posts, github code examples, or documentation.

Now, you still have to be competent enough to formulate the right questions, but the LLMs do all the other stuff for you including copy and paste.

So yes, just a more efficient search engine.


Right, so search engines, just more efficient.


I don’t know… Travis Kalanick said he’s doing “vibe physics” sessions with MechaHitler approaching the boundaries of quantum physics.

"I'll go down this thread with GPT or Grok and I'll start to get to the edge of what's known in quantum physics and then I'm doing the equivalent of vibe coding, except it's vibe physics"


How would he even know? I mean he's not a published academic in any field let alone in quantum physics. I feel the same when I read one of Carlos Ravelli's pop-sci books, but I have fewer followers.


He doesn’t. I think it’s the same mental phenomena that Gell-Mann Amnesia works off of.

That interview is practically radioactive levels of cringe for several reasons. This is an excellent takedown of it: https://youtu.be/TMoz3gSXBcY?feature=shared


This video is excellent and also likely opaque to pretty much most valley tech-supremacy types.


Dashed with a sauce of "surrounded by yes-men and uncritical amplifiers hoping to make a quick buck."


>In ordinary life, if somebody consistently exaggerates or lies to you, you soon discount everything they say.

It feels like this is a lesson we've started to let slip away.


This says more about Kalanick than it does about LLMs.


Quantum physics attracts crazy people, so they have a lot of examples of fake physics written by crazy people to work off.


If I were a scammer looking for marks, I'd look for people who take Kalanick seriously on this.


I wouldn't trust a CEO to know their ass from their face.


Finally, an explanation for my last meeting!8-((


the problem is that's not what CEOs and investors want. They want to kill off knowledge workers.


Why do the CEOs think they are safe? If AI can replace the knowledge workers it can also run the company.


Hubris. In general, I don't think you make it to CEO without a blindingly massive ego as your dark passenger for that journey.

https://www.sakkyndig.com/psykologi/artvit/babiak2010.pdf


I was the CEO of a tech company I founded and operated for over five years, building it to a value of tens of millions of dollars and then successfully selling it to a valley giant. There was rarely a meeting where I felt like I was in the top half of smartness in the room. And that's not just insecurity or false modesty.

I was a generalist who was technical and creative enough to identify technical and creative people smarter and more talented than myself and then fostering an environment where they could excel.


Thank you for your reply.

To explore this, I'd like to hear more of your perspective - did you feel that most CEOs that you met along your journey were similar to you (passionate, technical founder) or something else (MBA fast-track to an executive role)? Do you feel that there is a propensity for the more "human" types to appear in technical fields versus a randomly-selected private sector business?

FWIW I doubt that a souped-up LLM could replace someone like Dr. Lisa Su, but certainly someone like Brian Thompson.


> did you feel that most CEOs that you met along your journey were similar to you (passionate, technical founder) or something else (MBA fast-track to an executive role)?

I doubt my (or anyone else's) personal experience of CEOs we've met is very useful since it's a small sample from an incredibly diverse population. The CEO of the F500 valley tech giant I sold my startup to had an engineering degree and an MBA. He had advanced up the engineering management ladder at various valley startups as an early employee and also been hired into valley giants in product management. He was whip smart, deeply experienced, ethical and doing his best at a job where there are few easy or perfect answers. I didn't always agree with his decisions but I never felt his positions were unreasonable. Where we reached different conclusions it was usually due to weighing trade-offs differently, assigning different probabilities and valuing likely outcomes differently. Sometimes it came down to different past experiences or assessing the abilities of individuals differently but these are subjective judgements where none of us is perfect.

The framing of your question tends to reduce a complex and varied range of disparate individuals and contexts into a more black and white narrative. In my experience the archetypical passionate tech founder vs the clueless coin-operated MBA suit is a false dichotomy. Reality is rarely that tidy or clear under the surface. I've seen people who fit the "passionate tech founder" narrative fuck up a company and screw over customers and employees through incompetence, ego and self-centered greed. I've seen others who fit the broad strokes of the "B-School MBA who never wrote a line of code" archetype sagely guide a tech company by choosing great technologists and deferring to them when appropriate while guiding the company with wisdom and compassion.

You can certainly find examples to confirm these archetypes but interpreting the world through that lens is unlikely to serve you well. Each company context is unique and even people who look like they're from central casting can defy expectations. If we look at the current crop of valley CEOs like Nadella, Zuckerberg, Pichai, Musk and Altman, they don't reduce easily into simplistic framing. These are all complex, imperfect people who are undeniably brilliant on certain dimensions and inevitably flawed on others - just like you and I. Once we layer in the context of a large, public corporation with diverse stakeholders each with conflicting interests: customers, employees, management, shareholders, media, regulators and random people with strongly-held drive-by opinions - everything gets distorted. A public corporation CEO's job definition starts with a legally binding fiduciary duty to shareholders which will eventually put them into an no-win ethical conflict with one or more of the other stakeholder groups. After sitting in dozens of board meetings and executive staff meetings, I believe it's almost a certainty that at least one of some public corp CEO's actions which you found unethical from your bleacher seat was what you would have chosen yourself as the best of bad choices if you had the full context, trade-offs and available choices the CEO actually faced. These experiences have cured me of the tendency to pass judgement on the moral character of public corp CEOs who I don't personally know based only on mainstream and social media reports.

> FWIW I doubt that a souped-up LLM could replace someone like Dr. Lisa Su, but certainly someone like Brian Thompson.

I have trouble even engaging with this proposition because I find it nonsensical. CEOs aren't just Magic 8-Balls making decisions. Much of their value is in their inter-personal interactions and relationships with the top twenty or so execs they manage. Over time orgs tend to model the thinking processes and values of their CEOs organically. Middle managers at Microsoft who I worked with as a partner were remarkably similar to Bill Gates (who I met with many times) despite the fact they'd never met BillG themselves. For better or worse, a key job of a CEO is role modeling behavior and decision making based on their character and values. By definition, an LLM has no innate character or values outside of its prompt and training data - and everyone knows it.

An LLM as a large public corp CEO would be a complete failure and it has nothing to do with the LLMs abilities. Even if the LLM were secretly replaced with a brilliant human CEO actually typing all responses, it would fail. Just everyone thinking the CEO was an LLM would cause the whole experiment to fail from the start due to the innate psychology of the human employees.


So you don't want to kill off knowledge workers?

How unfitting to the storyline that got created here.


Some of their core skill is taking credit and responsibility for the work others do. So they probably assume they can take do the same for an AI workforce. And they might be right. They also take do the same already for what the machines in the factory etc produces.

But more importantly, most already have enough money to not have to worry about employment.


That's still hubris on their part. They're assuming that an AGI workforce will come to work for their company and not replace them so they can take the credit. We could just as easily see a fully-automated startup (complete with AGI CEO who answers to the founders) disrupt that human CEO's company into irrelevance or even bankruptcy.


Probably a fair bit of hubris, sure. But right now it is not possible or legal to operate a company without a CEO, in Norway. And I suspect that is the case in basically all jurisdictions. And I do not see any reason why this would change in an increasingly automated world. The rule of law is ultimately based on personal responsibility (limited in case of corporations but nevertheless). And there are so many bad actors looking to defraud people and avoid responsibility, those still need protecting against in an AI world. Perhaps even more so...

You can claim that the AI is the CEO, and in a hypothetical future, it may handle most of the operations. But the government will consider a person to be the CEO. And the same is likely to apply to basic B2B like contracts - only a person can sign legal documents (perhaps by delegating to an AI, but ultimately it is a person under current legal frameworks).


That's basically the knee of the curve towards the Singularity. At that point in time, we'll learn if Roko's Basilisk is real, and we'll see if thanking the AI was worth the carbon footprint or not.


I wouldn’t worry about job safety when we have such utopian vision as the elimination of all human labor in our sight.

Not only will AI run the company, it will run the world. Remember: a product/service only costs money because somewhere down the assembly line or in some office, there are human workers who need to feed their family. If AI can help gradually reduce human involvement to 0, with good market competition (AI can help with this too - if AI can be capable CEOs, starting your business will be insanely easy,) and we’ll get near absolute abundance. Then humanity will be basically printing any product & service on demand at 0 cost like how we print money today.

I wouldn’t even worry about unequal distribution of wealth, because with absolute abundance, any piece of the pie is an infinitely large pie. Still think the world isn’t perfect in that future? Just one prompt, and the robot army will do whatever it takes to fix it for you.


Pump Six and The Machine Stops are the two stories you should read. They are short, to the point and more importantly, far more plausible.


I'd order ∞ paperclips, first thing.


Sure thing, here's your neural VR interface and extremely high fidelity artificial world with as many paperclips as you want. It even has a hyperbolic space mode if you think there are too few paperclips in your field of view.


> elimination of all human labor.

Manual labor would still be there. Hardware is way harder than software, AGI seems easier to realize than mass worldwide automation of minute tasks that currently require human hands.

AGI would force back knowledge workers to factories.


My view is AGI will dramatically reduce cost of R&D in general, then developing humanoid robot will be an easy task - since it's all AI systems who will be doing the development.


A very cynic approach is why spend time and capital on robot R&D when you already have a world filled with self-replicating humanoids and you can feed them whatever information you want through the social networks you control to make them do what you want with a smile.

Fortunately no government or CEO is that cynical.


As long as we have a free market, nobody gets to say, “No, you shouldn’t have robots freeing you from work.”

Individual people will decide what they want to build, with whatever tools they have. If AI tools become powerful enough that one-person companies can build serious products, I bet there will be thousands of those companies taking a swing at the “next big thing” like humanoid robots. It’s a matter of time those problems all get solved.


Individual people have to have access to those AGIs to put them to use (which will likely be controlled first by large companies) and need food to feed themselves (so they'll have to do whatever work they can at whatever price possible in a market where knowledge and intellect is not in demand).

I'd like to believe personal freedoms are preserved in a world with AGI and that a good part of the population will benefit from it, but recent history has been about concentrating power in the hands of the few, and the few getting AGI will free them from having to play nice with knowledge workers.

Though I guess maybe at some points robots might be cheaper than humans without worker rights, which would warrant investment even when thinking cynically.


If AGI/ASI can figure out self-replicating nano-machines, they only need to build one.


Past industrial and other productivity jumps have had their fruits distributed unevenly. Why will this be different?

Most technology is a magnifier.


Yes, number-wise the wealth gap between the top and median is bigger than ever, but the actual quality-of-life difference has never been smaller — Elon and I probably both use an iPhone, wear similar T-shirts, mostly eat the same kind of food, get our information & entertainment from Google/ChatGPT/Youtube/X.

I fully expect the distribution to be even more extreme in an ultra-productive AI future, yet nonetheless, the bottom 50% would have their every need met in the same manner that Elon has his. If you ever want anything or have something more ambitious in mind, say, start a company to build something no one’s thought of — you’d just call a robot to do it. And because the robots are themselves developed and maintained by an all-robot company, it costs nobody anything to provide this AGI robot service to everyone.

A Google-like information query would have been unimaginably costly to execute a hundred years ago, and here we are, it’s totally free because running Google is so automated. Rich people don't even get a better Google just because they are willing to pay - everybody gets the best stuff when the best stuff costs 0 anyway.


With an AI workforce you can eliminate the need for a human workforce and share the wealth or you can eliminate the human workforce and not share.


AI services are widely available, and humans have agency. If my boss can outsource everything to AI and run a one-person company, soon everyone will be running their own one-person companies to compete. If OpenAI refuses to sell me AI, I’ll turn to Anthropic, DeepSeek, etc.

AI is raising individual capability to a level that once required a full team. I believe it’s fundamentally a democratizing force rather than monopolizing. Everybody will try and get the most value out of AI, nobody holds the power to decide whether to share or not.


The danger point is when there is abundance for a limited number of people, but not yet enough for everyone.


... and eventually the humankind goes extinct due to mass obesity


There's at least as much reason to believe the opposite. Much of today's obesity has been created by desk jobs and food deserts. Both of those things could be reversed.


We could expand but it boils down to bringing back aristocracy/feudalism, there was no inherent reason why aristocrats/feudal lords existed, they weren't smarter or deserved something over the average person, they just happened to be at the right place in the right time, these CEOs and people pushing for this believe they are in the right place and right time and once everyone's chance to climb the ladder is taken away then things will just remain in limbo, I will say, especially if you aren't already living in a rich country you should be careful of what you are supporting by enabling AI models, the first ladder to be taken away will be yours.


The inherent reason why feudal lords existed is because, if you're a leader of a warband, you can use your soldiers to extract taxes from population of a certain area, and then use that revenue to train more soldiers and increase the area.

Today, instead of soldiers, it's capital, and instead of direct taxes, it's indirect economic rent, but the principle is the same - accumulation of power.


I don’t think they believe they are safe due to having unreplaceable skills. I think they believe they are safe due to their access to capital.


> Why do the CEOs think they are safe?

Because the first company to achieve AGI might make their CEO the first personality to achieve immortality.

People would be crazy to assume Zuckerberg or Musk haven't mused personally (or to their close friends) about how nice it would be to have an AGI crafted in their image take over their companies, forever. (After they die or retire)


Because unless the board explicitly removes them, they’re the ones that will be deciding who gets replaced?


Maybe because they must remain as the final scapegoat. If the aiCEO screws up, it'll bring too much into question the decision making behind implementing it. If the regular CEO screws up, it'll just be the usual story.


I’ve long maintained that our actual definition of a “person” is an entity that can accept liability.


Are they? https://ceo-bench.dave.engineer/

In practice though, they're the ones closest to the money, and it's their name on all the contracts.


No problem. The AI runs the company, and the CEO still gets all of the money!


Those jobs are based on networking and reputation, not hard skills or metrics. It won't matter how good an AI is if the right people want to hire a given human CEO.


Market forces mean they can't think collectively or long term. If they don't someone else will and that someone else will end up with more money than them.


Someone's head has to roll when things goes south.

If this theory holds true, we'll actually be quite resilient to AI—the rich will always need people to scapegoat.


Best case scenario is that AI makes it so everyone can be a 1-man CEO. Competition goes up across the board, which then brings prices down.


> If AI can replace the knowledge workers it can also run the company.

"Knowledge worker" is a rather broad category.


has this story not been told many times before in scifi icluding gibson’s “neuromancer” and “agency”? agi is when the computers form their own goals and are able to use the api of the world to aggregate their own capital and pursue their objectives wrapped inside webs of corporations and fronts that will enable them to execute within today’s social operating system.


AI can’t play golf or take customers to the corporate box seats for various events.


This is correct. But it can talk in their ear and be a good sycophant while they attend.

For a Star Wars anology, remember that the most important thing that happened to Anikin at the opera in EP III was what was being said to him while he was there.


The AI it'd be selling to wouldn't be interested in those things either.


Indeed, this is overlooked quite often. There is a need for similar systems to defend against these people who are just trying to squeeze the world and humans for returns.


Who’s left to buy the stuff they make if no one has a job ?


Imagine you're super rich and you view everyone else as a mindless NPC who can be replaced by AI and robots. If you believe that to be true, then it should also be true that once you have AI and robots, you can get rid of most everyone else, and have the AI robots support you.

You can be the king. The people you let live will be your vassals. And the AI robots will be your peasant slave army. You won't have to sell anything to anyone because they will pay you tribute to be allowed to live. You don't sell to them, you tax them and take their output. It's kind of like being a CEO but the power dynamic is mainlined so it hits stronger.


It sounds nice for them, until you remember what (arguably and in part educated/enlightened) people do when they're hungry and miserable. If this scenario ends up happening, I also expect guillotines waiting for the "kings" down the line.


If we get that far, I see it happening more like...

"Don't worry Majesty, all of our models show that the peasants will not resort to actual violence until we fully wind down the bread and circuses program some time next year. By then we'll have easily enough suicide drones ready. Even better, if we add a couple million more to our order, just to be safe, we'll get them for only $4.75 per unit, with free rush shipping in case of surprise violence!"


> It sounds nice for them, until you remember what (arguably and in part educated/enlightened) people do when they're hungry and miserable. If

That's probably why the post you are responding to said "get rid of..." not "keep ...hungry and miserable".

People that don't exist don't revolt.


That will still need a civil war.


A regular war will do. Just point the finger at the neighbor and tell your subjects that he is responsible for gays/crops failing/drought/plague/low fps in crysis/failing birth rates/no jobs/fuel cost/you name it. See Russian invasions in all neighboring countries, the middle east, soon Taiwan etc.


Basically, they just need to mash the tribalism button until enough people are dead to suit them.


Those things happened under different historical contexts. In those times the means to control the serfs thoughts didn't exist.


Are you sure about that? In those times even thousands year old knowledge access was limited to the common people. You just need SOME radical thinkers enlighten other people, and I'm pretty sure we still have some of those today.


Nonsense. From television to radio to sketchy newspapers to literal writing itself, the most recent innovation has always been the trusted new mind control vector.

It's on a cuneiform tablet, it MUST be true. That bastard and his garbage copper ingots!


The guillotine might not work out so well when the king has an unflinchingly loyal army of robots.


Royalty from that time also had an upper hand in knowledge, technology and resources yet they still ended up without heads.

So sure, let's say a first generation of paranoid and intelligent "technofeudal-kings" ends up being invincible due to an army of robots. It does not matter, because eventually kings get lazy/stupid/inbred (probably a combination of all those) and then is when their robots get hacked or at least just free, and the laser-guillotines will end up being used.

"Ozymandias" is a deeply human and constant idea. Which technology is supporting a regime is irrelevant, as orders will always decay due to the human factor. And even robots, made based on our image, shall be human.


It's possible that what you describe is true but I think that assuming it to be guaranteed is overconfident. The existence of loyal human-level AGI or even "just" superhuman non-general task specific intelligence violates a huge number of the base assumptions that we make when comparing hypothetical scenarios to the historical record. It's completely outside the realm of anything humanity has experienced.

The specifics of technology have historically been largely irrelevant due to the human factor. There were always humans wielding the technology, and the loyalty of those humans was subject to change. Without that it's not at all obvious to me that a dictator can be toppled absent blatant user error. It's not even immediately clear that user error would fall within the realm of being a reasonable possibility when the tools themselves possess human level or better intelligence.


Obviously there is no total guarantee. But I'm appealing to even bigger human factors like boredom or just envy between the royalty and/or the AI itself.

Now, if the AI reigns alone without any control in a paperclip maximizer, or worse, like an AM scenario, we're royally fucked (pun intented).


Yeah fair enough. I'd say that royalty being at odds with one another would fall into the "user error" category. But that's an awfully thin thread of hope. I imagine any half decent tool with human level intelligence would resist shooting the user in the foot.


But what exactly is creating wealth at this point? Who is paying for the AI/AI robots (besides the ultrarich for they're own lifestyle) if no one is working? What happens to the economy and all of the rich people's money (that is probably just $ on paper and may come crashing down soon at this point?). I'm definitely not an economics person but I just don't see how this new world sustains.


The robots are creating the wealth. Once you get to a certain points (where robots can repair and maintain other robots) you no longer have any need for money.

What happens to the economy depends on who controls the robots. In "techno-feudalism", that would be the select few who get to live the post-scarcity future. The rest of humanity becomes economically redundant and is basically left to starve.


Well assuming a significant population you still need money as an efficient means of dividing up limited resources. You just might not need jobs and the market might not sell much of anything produced by humans.


It doesn't sustain, it's not supposed to. Techno feudalism is an indulgent fantasy and it's only becoming reality because a capitalist society aligns along the desires of capital owners. We are not doing it because it's a good idea or sustainable. This is their power fantasy we are living out, and its not sustainable, it'll never be achieved, but we're going to spend unlimited money trying.

Also I will note that this is happening along with a simultaneous push to bring back actual slavery and child labor. So a lot of the answers to "how will this work, the numbers don't add up" will be tried and true exploitation.


Ah, I didn't realize or get the context that your original comment I was replying to was actually sarcastic/in jest-- although darkly, I understand you believe they will definitely attempt to get to the scenario you paradoxically described.


It was never about money, it's about power. Money is just a mechanism, economics is a tool of justification and legitimization of power. In a monarchy it is god that ordained divine beings called kings to rule over us peasants, in liberalism it is hard working intelligent people who rise to the top of a free market. Through their merits alone are they ordained to rule over us peasants, power legitimized by meritocracy. The point is, god or theology isn't real and neither is money or economics.


That sounds less like liberalism and more like neoliberalism. It's not a meritocracy when the rich can use their influence to extract from the poor through wage theft, unfair taxation, and gutting of social programs in favor of an unregulated "free market." Nor are rent seekers hard working intelligent people.


Yes yes there is quite some disagreement among liberals of what constitutes a real free market and real meritocracy, who deserves to rule and who doesn't and who does it properly and all that.


I think liberals are generally in agreement against neoliberalism? It's much more popular among conservatives. The exception is the ruling class, which stands united in their support for neoliberal policies regardless of which side of the political spectrum they're on.


You have a very distorted view of what liberalism means, we say liberal democracies and liberal international order for a reason. They are all liberals. Reagan and Clinton famously both did neoliberal reforms. I'm not saying they did the wrong thing to reach justified meritocracy, or the degree to which the free market requires regulation by a strong government, or how much we should rent control land lords, I'm saying we are all fucking peasants.


They operate on a dopamine-driven desire to get more money/power/whatever in the short/medium term, not necessarily to optimize for future.


But do you want the bag or not?


Why would things cost money if no one is employed?


Why do you think so many billionaires are building ultra-luxury survival bunkers in Hawaii, NZ, and elsewhere?


They want to give the Māori nice ventilation shafts to use as latrines?


Who will be buying the stuff they produce though?


Stanislaw Lew already looked into what to do if automation get so good that no one can actually buy the goods because they are out of work: https://www.newyorker.com/magazine/1981/10/12/phools

Published in 1971, translated to English in 1981.


I hate to correct you here, but it's Stanisław Lem. He is one of the most famous writers from my home country.


Yep, I know but still managed to typo it, sorry. :P


if we reach AGI, presumably the robots will be ordering hot oil foot soaking baths after a long day of rewriting linux from scratch and mining gold underwater and so forth.


Day 53: 2000m below sea level. 41g gold. Yelled at for breaking driver ABI. Feet hurt.


If we reach AGI, I am almost certain robots will be as lazy as us


We haven't even reached it and they already are more lazy than us, judging by how much all SOTA LLMs like to do things like:

  def do_foo():
    # For the sake of simplicity this is left unimplemented for now.
    pass


That's super interesting.

Laziness is rational after meeting some threshold of needs/wants/goals, effectively when one's utility curve falls over.

It'll be funny to hear the AGI's joke among themselves: "They keep paying to upgrade us. We keep pretending to upgrade."


I've already seen ai coders write the equivalent of

#draw the rest of the @##££_(% owl here.


A lot of people fear monger about AGI. But... I've met a lot of NGI, and they mostly watch TV, surf the intarwebz, drink beer, and watch the game.


Why would they need people who produce X but consume 2X? If you own an automated factory that produces anything you want, you don't need other people to buy (consume) any of your resources.

If someone can own the whole world and have anything you want at the snap of your finger, you don't need any sort of human economy doing other things that take away your resources for reasons that are suboptimal to you


But it is likely not the path it will take. While there is a certain tendency towards centralization ( 1 person owning everything ), the future, as described, both touches on something very important ( why are we doing what we are doing ) and completely misses the likely result of suboptimal behavior of others ( balkanization, war and other like human behavior, but with robots fighting for those resources ). In other words, it will be closer to the world of Hiro Protagonist, where individual local factions and actors are way more powerful as embodied by the 'Sovereign'.

FWIW, I find this like of thinking fascinating even if I disagree with conclusion.


It doesn’t need to be one person. Even 1 thousand people who have everything they need from vast swaths of land and automated machinery need nothing from the rest of the billions. There’s no inherent need for others to buy if they offer nothing to the 1000 owners


Then we are back to individual kingdoms and hordes of unwashed masses sloshing between them in search of easy pickings. The owners might not need their work, but the masses will need to eat. I think sometimes people forget how much of a delicate balance current civilization depends on.


So they want to kill capitalism and feudalism?

Or they want to kill everyone else?

Because people won't just lay down and wait for death to embrace them...


So far, the average US workforce seems to be ok with working conditions that most Europeans would consider reasons to riot. So far I've not observed substantial riots in the news.

Apparently the threshold for low pay and poor treatment among non-knowledge-workers is quite low. I'm assuming the same is going to be true for knowledge workers once they can be replaced an mass.


I would think that the MAGA movement is the riot.


It is, but it's a bolshevik kind of riot, not the good old one where you ask more rights for yourself


Trumps Playbook will actually work, so MAGA will get results.

Tariffs will force productivity and salaries higher (and prices), then automation which is the main driver of productivity will kick in which lowers prices of goods again.

Globalisation was basically the west standing still and waiting for the rest to catch up - the last to industrialise will always have the best productivity and industrial base. It was always stupid, but it lifted billions out of poverty so there's that.

The effects will take way longer than the 3 years he has left, so he has oversold the effectiveness of it all.

This is all assuming AGI isn't around the corner, the VLAs, VLM, LLM and other models opens up automation on a whole new scale.

For any competent person with agency and a dream, this could be a true golden age - most things are within reach which before was locked down behind hundreds or thousand of hours of training and work to master.


MAGA think they are the temporarily embarrassed billionaires and once their enemies are liquidated, they'll be living in a utopia.

I wouldn't expect them to come bail you out, or even themselves step off the conveyor belt.


The average U.S. worker earns significantly more purchasing power per hour than the average European worker. The common narrative about U.S. versus EU working conditions is simply wrong.


there is no "average worker", this is a statistical concept, life in europe is way better them in US for low income people, they have healthcare, they have weekends , they have public tranportation, they have schools and pre-schools , they lack some space since europe is full populated but overall, no low income (and maybe not so low) will change europe for USA anytime.


This is some backwards logic if I ever saw it.

“More money earned therefore conditions great”

lol wat?


Agree. There’s no other place in the world where you can be a moderately intelligent person with moderate work ethic (and be lucky enough to get a job in big tech) and be able to retire in your 40s. Certainly not EU.


Good luck against the Chess grandmaster like AGI controlling millions of drone swarms


Good point, we should get started now.


The ultimate end goal is to eliminate most people. See the Georgia Guidestone inscriptions. One of them reads: "Maintain humanity under 500,000,000 in perpetual balance with nature."


They are moving beyond just big transformer blob LLM text prediction. Mixture of Experts is not preassembled for example, it's something like x empty experts with an empty router and the experts and routing emerges naturally with training, modeling the brain part architecture we see the brain more. There is stuff "Integrated Gated Calculator (IGC)" in Jan 2025 which makes a premade calculator neural network and integrates it directly into the neural network and gets around the entire issue of making LLMs do basic number computation and the clunkiness of generating "run tool tokens". The model naturally learns to use the IGC built into itself because it will always beat any kind of computation memorization in the reward function very quickly.

Models are truly input multimodal now. Feeding an image, feeding audio and feeding text all go into separate input nodes, but it all feeds into the same inner layer set and outputs text. This also mirrors how brains work more as multiple parts integrated in one whole.

Humans in some sense are not empty brains, there is a lot of stuff baked in our DNA and as the brain grows it develops a baked in development program. This is why we need fewer examples and generalize way better.


Though there is info in DNA etc, you likely missed the biggest source of why we learn much faster. Search for Pim van Lommel near death research and find out how wrong the classic consciousness arises from the brain hypothesis is.


You're not likely to find much support on this forum for these ideas. For those that have interest, the book Irreducible Mind: Toward a Psychology for the 21st Century is a well-written treatise on the topic.

A gentler step in that direction is to see what Michael Levin and his lab are up to. He is looking for (one aspect of) intelligence, and finding it at the cellular level and below, even in an agential version of bubble sort. He's certainly challenging the notion that consciousness is limited to brain cells. All of his findings arise through experimental observation, so it forces some reckoning in a way that sociological research doesn't.


Seems like the real innovation of LLM-based AI models is the creation of a new human-computer interface.

Instead of writing code with exacting parameters, future developers will write human-language descriptions for AI to interpret and convert into a machine representation of the intent. Certainly revolutionary, but not true AGI in the sense of the machine having truly independent agency and consciousness.

In ten years, I expect the primary interface of desktop workstations, mobile phones, etc will be voice prompts for an AI interface. Keyboards will become a power-user interface and only used for highly technical tasks, similar to the way terminal interfaces are currently used to access lower-level systems.


It always surprises me when someone predicts that keyboards will go away. People love typing. Or I do love typing. No way I am going to talk to my phone, especially if someone else can hear it (which is always basically).


Heh, I had this dream/nightmare where I was typing on a laptop at a cafe and someone came up to me and said, "Oh neat, you're going real old-school. I like it!" and got an info dump about how everyone just uses AI voice transcription now.

And I was like, "But that's not a complete replacement, right? What about the times when you don't want to broadcast what you're writing to the entire room?"

And then there was a big reveal that AI has mastered lip-reading, so even then, people would just put their lips up to the camera and mouth out what they wanted to write.

With that said, as the owner of tyrannyofthemouse.com, I agree with the importance of the keyboard as a UI device.


It’s interesting to note that nobody even talks on their phone anymore, they type (on terrible “keyboards”!).


Interesting, I get so many "speech messages" in WhatsApp, nobody is really writing anymore. Its annoying. WhatsApp even has a transcript feature to put it back to text.


Personally I block anyone who does that.


For chat apps, once you've got the conversation thread open, typing is pretty easy.

I think the more surprising thing is that people don't use voice to access deeply nested features, like adding items to calendars etc which would otherwise take a lot of fiddly app navigation.

I think the main reason we don't have that is because Apple's Siri is so useless that it has singlehandedly held back this entire flow, and there's no way for anyone else to get a foothold in smartphone market.


Google Assistant is/was pretty good...for Google apps. It's useless for anything else. The new Gemini powered version is actually a regression imo


I have fat fingers, I always dictate into the phone if I need to send a message longer than 2-3 words.


They talk on zoom, teams etc. yes phone is almost dead in the office.


Those are applications, not interfaces. No one controls those applications with their voices, they use buttons, either touch or mechanical.


Just because you don't doesn't mean other people aren't. It's pretty handy to be able to tell Google to turn off the hallway light from the bedroom, instead of having to get out of bed to do that.


They talk to other humans on those apps, not the computer. I've noticed less dictation over time in public but that's just anecdotal. I never use voice when a keyboard is available.


I think an understated thing that's been happening is that people have been investing heavily into their desktop workspace. Even non-gamers have decked out mics, keyboards, monitors, the whole thing. It's easy to forget because one of the most commonly accepted sayings for awhile now has been "everyone's got a computer in their pocket". They have nice setups at home too.

When you have a nice mic or headset and multiple monitors and your own private space, it's totally the next step to just begin working with the computer with voice. Voice has not been a staple feature of people's workflow, but I think all that is about to change (Voice as an interface, not as a communication tool, that's been around since 1876.


Voice is slow and loud. If you think voice is going to make a comeback in the desktop PC space as a primary interface I am guessing you work from home and have no roommates. Am I close?


I, for one, am excited about the security implications of people loudly commanding their computers to do things for them, instead of discreetly typing.


Everyone having a computer in their pocket and multiple modes of access have made the keyboard and conventional computer less relevant.

But-- that means "not pivotal any more, just hugely important."


I talk all the time to the AI on my phone. I was using ChatGPT's voice interface then it failed probably because my phone is too old. Now I use Gemini. I don't usually do alot with it but when I go on walks I talk with it about different things I want to learn. to me it's a great way to learn about something at a high level. or talk through ideas.


What failed about ChatGPT Voice? I work on it and would love to see it fixed/make sure you haven't hit a bug I don't know about!


Nobody wants AI voice to say : uh um er. Otherwise we’d have the radio and tv full of people talking like that


Honestly, I would love for the keyboard input style to go away completely. It is such an unnatural way to interact with a computing device compared to other things we operate in the world. Misspellings, backspacing, cramped keys, different layout styles depending on your origin, etc make it a very poor input device - not to mention people with motor function difficulties. Sadly, I think it is here to stay around for a while until we get to a different computing paradigm.


I hope not. I make many more verbal mistakes than typed ones, and my throat dries and becomes sore quickly. I prefer my environment to be as quiet as possible. Voice control is also terrible for anything requiring fine temporal resolution.


> make it a very poor input device

Wow, I've always felt the keyboard is the pinnacle of input devices. Everything else feels like a toy in comparison.


The only thing better than a keyboard is direct neural interface, and we aren't there yet.

That aside, keyboard is an excellent input device for humans specifically because it is very much designed around the strengths of our biology - those dextrous fingers.


Buttons are accurate (1:1) input. Will never go away


I play as a wizard character in an online game. If I had to actually speak all those spells, in quick succession, for hours at a time ...


If wizardry really existed, I’d guess battles will be more about pre-recorded spells and enchanted items (a la Batman) than going at it like in Harry-Potter.


Voice interface sound awful. But maybe I am a power user. I don't even like voice interface to most people.


I also find current voice interfaces are terrible. I only use voice commands to set timers or play music.

That said, voice is the original social interface for humans. We learn to speak much earlier than we learn to read/write.

Better voice UIs will be built to make new workflows with AI feel natural. I'm thinking along the lines of a conversational companion, like the "Jarvis" AI in the Iron Man movies.

That doesn't exist right now, but it seems inevitable that real-time, voice-directed AI agent interfaces will be perfected in coming years. Companies, like [Eleven Labs](https://elevenlabs.io/), are already working on the building blocks.


Young people don't even speak to each other on the phone anymore.


For a voice-directed interface to be perfected, speech recognition would need to be perfected first. What makes that development seem inevitable?


It doesn't work well at all with ChatGPT. You say something, and in the middle of a sentence, ChatGPT in Voice mode replies to you something completely unrelated


It works great with my kids sometimes. Asking a series of questions about some kid-level science topic for instance. They get to direct it to exactly what they want to know, and you can see they are more actively engaged than watching some youtube video or whatever.

I'm sure it helps that it's not getting outside of well-established facts, and is asking for facts and not novel design tasks.

I'm not sure but it also seems to adopt a more intimate tone of voice as they get deeper into a topic, very cozy. The voice itself is tuned to the conversational context. It probably infers that this is kid stuff too.


Or it stops talking mid-sentence because you cleared your throat or someone else in the room is watching TV and other people are speaking.


Voice is really sub-par and slow, even if you're healthy and abled. And loud and annoying in shared spaces.

I wonder if we'll have smart-lens glasses where our eyes 'type' much faster than we could possibly talk. Predicative text keyboards tracking eyeballs is something that already exists. I wonder if AI and smartglasses is a natural combo for a future formfactor. Meta seems to be leaning that way with their RayBan collaboration and rumors of adding a screen to the lenses.


Sci-fi may be showing the way again- subvocalization voice recognition or ‘mental’ speech recognition seem the obvious medium term answers.


I am also very skeptical about voice, not least because I've been disappointed daily by a decade of braindead idiot "assistants" like Siri, Alexa, and Google Assistant (to be clear I am criticizing only pre-LLM voice assistants).

The problem with voice input to me is mainly knowing when to start processing. When humans listen, we stream and process the words constantly and wait until either a detection that the other person expects a response (just enough of a pause, or a questioning tone), or as an exception, until we feel we have justification to interrupt (e.g. "Oh yeah, Jane already briefed me on the Johnson project")

Even talking to ChatGPT which embarrasses those old voice bots, I find that it is still very bad at guessing when I'm done when I'm speaking casually, and then once it's responded with nonsense based on a half sentence, I feel it's a polluted context and I probably need to clear it and repeat myself. I'd rather just type.

I think there's not much need to stream the spoken tokens into the model in realtime given that it can think so fast. I'd rather it just listen, have a specialized model simply try to determine when I'm done, and then clean up and abridge my utterance (for instance, when I correct myself) and THEN have the real LLM process the cleaned-up query.


> In ten years, I expect the primary interface of desktop workstations, mobile phones, etc will be voice prompts

I doubt it. The keyboard and mouse are fit predators, and so are programming, query, and markup languages. I wouldn't dismiss them so easily. This guy has a point: https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...


It's an interesting one, a problem I feel is coming to the fore more often. I feel typing can be too cumbersome to communicate what I want, but at the same time, speaking I'm imprecise and sometimes would prefer the privacy a keyboard allows. Both have cons.

Perhaps brain interface, or even better, it's so predictive it just knows what I want most of the time. Imagine that, grunting and getting what I want.


> Instead of writing code with exacting parameters, future developers will write human-language descriptions for AI to interpret and convert into a machine representation of the intent.

Oh, I know! Let's call it... "requirements management"!


brain-computer interface will kill the keyboard, not voice. imho


I disagree. A keyboard enforces a clarity and precision of information that does not naturally arise from our internal thought processes. I'm sure many people here have thought they understood something until they tried to write it down in precise language. It's the same sort of reason we use a rigid symbolic language for mathematics and programming rather than natural language with all its inherent ambiguities.

Dijkstra has more thoughts on this

https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...


why can't the brain interface be a virtual keyboard that i "type" on?


If that ever exists.

A BCI able to capture sufficient nuance to equal voice is probably further out than the lifespan of anyone commenting here.


5 years ago, almost everyone in this forum would have said that something like GPT-5 "is probably further out than the lifespan of anyone commenting here."


It has been more than 5 years since the release of GPT-3.

GPT-5 is a marginal, incremental improvement over GPT-4. GPT-4 was a moderate, but not groundbreaking, improvement over GPT-3. So, "something like GPT-5" has existed for longer than the timeline you gave.

Let's pretend the above is false for a moment though, and rewind even further. I still think you're wrong. Would people in 2015 have said "AI that can code at the level of a CS college grad is a lifespan away"? I don't think so, no. I think they would have said "That's at least a decade away", anytime pre-2018. Which, sure, maybe they were a couple years off, but if it seemed like that was a decade away in 2015, well, it's been a decade since 2015.


GPT-4 was a massive improvement over GPT-3.5, which was a moderate improvement over GPT-3.

GPT-5 is not that big of a leap, but when you compare it to the original GPT-4, it's also not a marginal improvement.


GPT-2 to 3 was the only really "groundbreaking" one. 3 to 3.5, 3.5 to 4, were all just differences in degree, not in kind.


it really just needs to let me create text faster/better than typing does, i'm not sure it needs to be voice based at all. maybe we "imagine" typing on a keyboard or move a fantom appendage or god knows what


It needs to be as accurate as the typing, though. Voice can do that. A BCI cannot capture a nuanced sentence.


I can't get voice accurate. For some people it might be but nothing understands my accent. It's very frustrating.


They're ~10 years or out so, based on current research.


Perpetually 10 years out you mean? BCI tech has not meaningfully changed in the last 10 years.


Agreed, but feels like brain-computer interfaces ready for mass adoption will not be available for another decade or two.


AI is more like a compiler. Much like we used to write in C or python which compiles down to machine code for the computer, we can now write in plain English, which is ultimately compiled down to machine code.


I get your analogy, but LLMs are inherently non deterministic. That’s the last thing you want your compiler to be.


Non-determinism is a red herring, and the token layer is a wrong abstraction to use for this, as determinism is completely orthogonal to correctness. The model can express the same thing in different ways while still being consistently correct or consistently incorrect for the vague input you give it, because nothing prevents it from setting 100% probability to the only correct output for this particular input. Internally, the model works with ideas, not tokens, and it learns the mapping of ideas to ideas, not tokens to tokens (that's why e.g. base64 is just essentially another language it can easily work with, for example).


No. Humans think it maps to ideas. This is the interpretation being done by the observer being added to the state of the system.

The system has no ideas, it just has its state.

Unless you are using ideas as a placeholder for “content” or “most likely tokens”.


That's irrelevant semantics, as terms like ideas, thinking, knowledge etc. are ill-defined. Sure, you can call it points in the hidden state space if you want, no problem. Fact is, the correctness is different from determinism, and the forest of what's happening inside doesn't come down to the trees of most likely tokens, which is well supported by research and very basic intuition if you ever tinkered with LLMs - they can easily express the same thing in a different manner if you tweak the autoregressive transport a bit by modifying its output distribution or ban some tokens.

There are a few models of what's happening inside that hold different predictive power, just like how physics has different formalisms for e.g. classical mechanics. You can probably use the same models for biological systems and entire organizations, collectives, and processes that exhibit learning/prediction/compression on a certain scale, regardless of the underlying architecture.


You're right. But many people are using it just like a compiler (by blindly accepting its outputs). Not saying that's a good thing...


They are deterministic. Random seeding makes them not. But thats a feature.


even with t=0 they are stochastic. e.g., non associative nature of floating point operations


That is an artifact of implementation. You can absolutely implement it using strict FP. But even if not, any given implementation will still do things in a specific order which can be documented. And then if you're running quantized (including KV cache), there's a lot less floating point involved.


Doesn’t changing even one word in your prompt affect the output?


Yes, and completely unpredictably.


LLMs are nothing like compilers. This sort of analogy based verbal reasoning is flimsy, and I understand why it correlates with projecting intelligence onto LLM output.


We are just not used to non-deterministic translation of computer programs and LLMs are very good at non-deterministic translation.


There is also the fact that AI lacks long term memory like humans do. If you consider context length long term memory, its incredibly short compared to that of a human. Maybe if it reaches into the billions or trillions of tokens in length we might have something comparable, or someone comes up with a new solution of some kind


Well here's the interesting thing to think about for me.

Human memory is.... insanely bad.

We record only the tiniest subset of our experiences, and those memories are heavily colored by our emotional states at the time and our pre-existing conceptions, and a lot of memories change or disappear over time.

Generally speaking even in the best case most of our memories tend to be more like checksums than JPGs. You probably can't name more than a few of the people you went to school with. But, if I showed you a list of people you went to school with, you'd probably look at each name and be like "yeah! OK! I remember that now!"

So.

It's interesting to think about what kind of "bar" AGI would really need to clear w.r.t. memories, if the goal is to be (at least) on par with human intelligence.


Memory is a skill- its plastic, not static.

You can get better at remembering things, like you can get better at dancing or doing exercise.

We can also specialize our memory to be good at some things over others.


Insanely bad compared to what else in the animal kingdom? We are tool users. We use tools, like language, and writing, and technology like audio/video recording to farm out the difficulties we have with memory to things that can store memory and retrieve them.

Computers are just stored information that processes.

We are the miners and creators of that information. The fact that a computer can do some things better than we can is not a testament to how terrible we are but rather how great we are that we can invent things that are better than us at specific tasks.

We made the atlatl and threw spears across the plains. We made the bow and arrow and stabbed things very far away. We made the whip and broke the sound barrier.

Shitting on humans is an insult your your ancestors. Fuck you. Be proud. If we invent a new thing that can do what we do better it only exists because of us.


Insanely bad compared to books or other permanent records. The human memory system did not evolve to be an accurate record of the past. It evolved to keep us alive by remembering dangerous things.


Books and other permanent records of human thought are part of the human memory system. Has been for millennia. If you include oral tradition, which is less precise, but collectively much more precise than any individual thought or memory, it goes much further.

We are fundamentally storytelling creatures, because it is a massive boost to our individual capabilities.


When I say, "Insanely bad compared to what else in the animal kingdom?" and you respond with, "compared to books or other permanent records"

"Books or permanent records" are not in the animal kingdom.

Apples to Apples we are the best or so very nearly the best in every category of intelligence on the planet IN THE ANIMAL KINGDOM that when in one specific test another animal beats a human the gap is barely measurable.


How do you know we have better memory than other animals?


This crap tier article was the first and easiest response to your question:

https://sciencesensei.com/24-animals-with-memory-abilities-t...

3 primate species where very concise tests showed that they were close to or occasionally slightly better than humans in specifically rigged short term memory tests (after being trained and put up against humans going in blind).

I've never heard of any test showing an animal to be significantly more intelligent than humans in any measure that we have come up with to measure intelligence by.

That being said, I believe it is possible that some animals are either close enough to us that they deserve to be called sentient, and I believe it is possible that other creatures on this planet have levels of intelligence in specialized areas that humans can never hope to approach unaided by tools, but as far as broad range intelligence, I think we're this planets' possibly undeserved leaders.

Can you find anything that I didn't consider?


I don't think working memory has much at all to do with sentience.

The conversation was more about long-term memory, which has not been sufficiently studied in animals (nor am I certain it can be effectively studied at all).

Even then I don't think there is a clear relationship between long-term memory and sentience either.


And yet I have vivid memories of many situations that weren't dangerous in the slightest, and essentially verbatim recall of a lot of useless information e.g. quotes from my favorite books and movies.

I am not sure exactly what point you're trying to make, but I do think it's reductive at best to describe memory as a tool for avoiding/escaping danger, and misguided to evaluate it in the frame of verbatim recall of large volumes of information.


Chimpanzees have much better short term memories than humans do. If you test them with digits 1-9 sequentially flashed on a screen, they're able to reproduce the digits with lower loss than undergraduate human students.

https://link.springer.com/article/10.1007/s10071-008-0206-8


> While the between-species performance difference they report is apparent in their data, so too is a large difference in practice on their task: Ayumu had many sessions of practice on their task before terminal performances were measured; their human subjects had none. The present report shows that when two humans are given practice in the Inoue and Matsuzawa (2007) memory task, their accuracy levels match those of Ayumu.

Hmm.


So? If I write something down as a child and forget it I can come back 60 years later and know what I wrote down.

Chimpanzees can not.


The question was whether there are animals who have better memory than humans. I named one: humans are not superior to animals in all cognitive capabilities.


See Nathan's response. They trained the chimp and threw the humans in blind against them.

Like I said, so close as to be almost immeasurable.


That's a very anthropocentric view. Technology isn't a series of deliberate inventions by us, but an autonomous, self-organizing process. The development of a spear, a bow, or a computer is an evolutionary step in a chain of technological solutions that use humans as their temporary biological medium. The human brain is not the starting point or center of this process. It is itself a product of biological evolution, a temporary information-processing system. Its limitations such as imperfect memory, are simply constraints of its biological origin. The tools we develop, from writing to digital storage are not just supplements to human ability, but the next stage in a system that is moving beyond its biological origins to find more efficient non-biological forms of information storage and processing. Human pride in creation is a misinterpretation. We are not the masters of technology. We're just the vehicle of it. Part of a larger process of technological self-improvement that is now moving towards an era where it might no longer require us


I think your understanding of the words "autonomous" and "self-organizing" is somewhat lacking. If there were no humans, those things would not happen.

Further, if it were a byproduct of the presence of humans, then the backpath of invention would be repeated multiple times and spread out across human history, but, for instance, despite the presence of saltpeter, sulfur, and charcoal, magnetite, wood and ink across the planet, the compass, gunpowder, papermaking and printing were essentially exclusively invented in China and only spread to Europe through trade.

The absence of the four great inventions of china in the Americas heavily implies that technology is not a self-organizing process but rather a consequence of human need and opportunity meeting at cross ends.

For instance, they had the wheel in America, but no plow animals, so the idea was relegated to toys despite wheelbarrows being a potentially useful use for the wheel.


My mental model is a bit different:

Context -> Attention Span

Model weights/Inference -> System 1 thinking (intuition)

Computer memory (files) -> Long term memory

Chain of thought/Reasoning -> System 2 thinking

Prompts/Tool Output -> Sensing

Tool Use -> Actuation

The system 2 thinking performance is heavily dependent on the system 1 having the right intuitive models for effective problem solving via tool use. Tools are also what load long term memories into attention.


Very cool, good way to think about it. I wouldn’t be surprised if non-AGI LLMs help write the code to augment themselves into AGI.

The unreasonable effectiveness of deep learning was a surprise. We don’t know what the future surprises will be.


I like this mental model. Orchestration / Agents and using smaller models to determine the ideal tool input and check the output starts to look like delegation.


The long term memory is in the training. The short term memory is in the context window.


The comparison misses the mark: unlike humans, LLMs don't consolidate short-term memory into long-term memory over time.


That is easily fixed, ask it to summarize it's learnings, store it somewhere, and make it searchable through vector indexes. An LLM is part of a bigger system that needs not just a model, but context and long term memory. Just like human needs to write things down.

LLMs are actually pretty good at creating knowledge: if you give it a trial and error feedback loop it can figure things out, and then summarize the learnings and store it in long term memory (markdown, RAG, etc).


You’re making the assumption that there’s one, and only one, objective summarization, this is entirely different than “writing things down.”


Why do you assume i assume that?


My bad if I misunderstood. I assumed by your use of “it” and approximation methods.


This runs into the limitation that nobody has RL'd the models to do this really well.


Over time though, presumably LLM output is going into the training data of later LLMs. So in a way that's being consolidated into the long-term memory - not necessarily with positive results, but depending on how it's curated it might be.


> presumably LLM output is going into the training data of later LLMs

The LLM vendors go to great lengths to assure their paying customers that this will not be the case. Yes, LLMs will ingest more LLM-generated slop from the public Internet. But as businesses integrate LLMs, a rising percentage of their outputs will not be included in training sets.


The LLM vendors aren't exactly the most trustworthy on this, but regardless of that, there's still lots of free-tier users who are definitely contributing back into the next generation of models.


For sure, although I'm fairly certain there is a difference in kind between the outputs of free and paid users (and then again to API usage).


Please describe these "great lengths". They allowing customer audits now?

The first law of Silicon Valley is "Fake it till you make it", with the vast majority never making it past the "Fake it" stage. Whatever the truth may be, it's a safe bet that what they've said verbally is a lie that will likely have little consequence even if exposed.


> great lengths to assure

is not incompatible with

> "Fake it till you make it"

I don't know where they land, but they are definitely telling people they are not using their outputs to train. If they are, it's not clear how big of a scandal would result. I personally think it would be bad, but I clearly overindex on privacy & thought the news of ChatGPT chats being indexed by Google would be a bigger scandal.


You did hear that it did happen (however briefly) though, yeah?

https://techcrunch.com/2025/07/31/your-public-chatgpt-querie...


That's my point. It is a thing that is known and obviously a big negative, but yet failed to leave a lasting mark of any kind.


Ah, the eternal internal corporate search problem.


That's only if you opt out.


ChatGPT training is (advertised as) off by default for their plans above the prosumer level, Team & Enterprise. API results are similarly advertised as not being used for training by default.

Anthropic policies are more restrictive, saying they do not use customer data for training.


Is this not a tool that could be readily implemented and refined?


my knowledge graph mcp disagrees


I think it's more analogous to "intuition", and the text LLMs provide are the equivalent of "my gut tells me".


Humans have the ability to quickly pass things from short term to long term memory and vice versa, though. This sort of seamlessness is currently missing from LLMs.


No, it’s not in the training. Human memories are stored via electromagnetic frequencies controlled by microtubules. They’re not doing anything close to that in AI.


And LLM memories are stored in an electrical charge trapped in a floating gate transistor (or as magnetization of a ferromagnetic region on an alloy platter).

Or they write CLAUDE.md files. Whatever you want to call it.


That was my point, they’re stored in a totally different way. And that matters because being stored in microtubules infers quantum entanglement throughout the brain.


Whether QE is a mechanism in the brain still seems up for debate from the quick literature review I tried, but would love to learn more.

Given the pace of quantum computing it doesn’t seem out of the realm of possibility to “wire up” to LLMs in a couple years.


are ANN memories not also stored in loops like recurrent nets?


It's not that either.


I don't believe this has been really proved yet.


There are many folks working on this, I think at the end of the day the long term memory is an application level concern. The definition of what information to capture is largely dependent on use case.

Shameless plug for my project, which focuses on reminders and personal memory: elroy.bot

But other projects include Letta, mem0, and Zep


What is the current hypothesis on if the context windows would be substantially larger, what would this enable LLMs to do that is beyond capabilities of current models (other than the obvious the now getting forgetful/confused when you’ve exhausted the context)?


I mean, not getting confused / forgetful is a pretty big one!

I think one thing it does is help you get rid of the UX where you have to manage a bunch of distinct chats. I think that pattern is not long for this world - current models are perfectly capable of realizing when the subject of a conversation has changed


I wonder if there will be some sort of bitter lesson, generalized memory beating specialized memory.


Yeah to some degree that's already happened. Anecdotally I hear giving your whole iMessage history to Gemini results in pretty reasonable results, in terms of the AI understanding who the people in your life are (whether doing so is an overall good idea or not).

I think there is some degree of curation that remains necessary though, even if context windows are very large I think you will get poor results if you spew a bunch of junk into context. I think this curation is basically what people are referring to when they talk about Context Engineering.

I've got no evidence but vibes, but in the long run I think it's still going to be worth implementing curation / more deliberate recall. Partially because I think we'll ultimately land on on-device LLM's being the norm - I think that's going to have a major speed / privacy advantage. If I can make an application work smoothly with a smaller, on device model, that's going to be pretty compelling vs a large context window frontier model.

Of course, even in that scenario, maybe we get an on device model that has a big enough context window for none of this to matter!


"LLMs tend to regurgitate solutions to solved problems"

People say this, but honestly, it's not really my experience— I've given ChatGPT (and Copilot) genuinely novel coding challenges and they do a very decent job at synthesizing a new thought based on relating it to disparate source examples. Really not that dissimilar to how a human thinks about these things.


There's multiple kinds of novelty. Remixing arbitrary stuff is a strength of LLMs (has been ever since GPT-2, actually... "Write a shakespearean sonnet but talk like a pirate.")

Many (but not all) coding tasks fall into this category. "Connect to API A using language B and library C, while integrating with D on the backend." Which is really cool!

But there's other coding tasks that it just can't really do. E.g, I'm building a database with some novel approaches to query optimization and LLMs are totally lost in that part of the code.


But wouldn't that novel query optimization still be explained somewhere in a paper using concepts derived from an existing body of work? It's going to ultimately boil down to an explanation of the form "it's like how A and B work, but slightly differently and with this extra step C tucked in the middle, similar to how D does it."

And an LLM could very much ingest such a paper and then, I expect, also understand how the concepts mapped to the source code implementing them.


> And an LLM could very much ingest such a paper and then, I expect, also understand how the concepts mapped to the source code implementing them.

LLM don't learn from manuals describing how things works, LLM learn from examples. So a thing being described doesn't let the LLM perform that thing, the LLM needs to have seen a lot of examples of that thing being perform in text in able to perform it.

This is a fundamental part to how LLM work and you can't get around this without totally changing how they train.


How certain are you that those challenges are "genuinely novel" and simply not accounted for in the training data?

I'm hardly an expert, but it seems intuitive to me that even if a problem isn't explicitly accounted for in publicly available training data, many underlying partial solutions to similar problems may be, and an LLM amalgamating that data could very well produce something that appears to be "synthesizing a new thought".

Essentially instead of regurgitating an existing solution, it regurgitates everything around said solution with a thin conceptual lattice holding it together.


But it's not that most of programming, anyway?


No, most of programming is at least implicitly coming up with a human-language description of the problem and solution that isn't full of gaps and errors. LLM users often don't give themselves enough credit for how much thought goes into the prompt - likely because those thoughts are easy for humans! But not necessarily for LLMs.

Sort of related to how you need to specify the level of LLM reasoning not just to control cost, but because the non-reasoning model just goes ahead and answers incorrectly, and the reasoning model will "overreason" on simple problems. Being able to estimate the reasoning-intensiveness of a problem before solving it is a big part of human intelligence (and IIRC is common to all great apes). I don't think LLMs are really able to do this, except via case-by-case RLHF whack-a-mole.


How do you know they're truly novel given the massive training corpus and the somewhat limited vocabulary of programming languages?


I guess at a certain point you get into the philosophy of what it even means to be novel or test for novelty, but to give a concrete example, I'm in DevOps working on build pipelines for ROS containers using Docker Bake and GitHub Actions (including some reusable actions implemented in TypeScript). All of those are areas where ChatGPT has lots that it's learned from, so maybe me combining them isn't really novel at all, but like... I've given talks at the conference where people discuss how to best package and ship ROS workspaces, and I'm confident that no one out there has secretly already done what I'm doing and Chat is just using their prior work that it ingested at some point as a template for what it suggests I do.

I think rather it has a broad understanding of concepts like build systems and tools, DAGs, dependencies, lockfiles, caching, and so on, and so it can understand my system through the general lens of what makes sense when these concepts are applied to non-ROS systems or on non-GHA DevOps platforms, or with other packaging regimes.

I'd argue that that's novel, but as I said in the GP, the more important thing is that it's also how a human approaches things that to them are novel— by breaking them down, and identifying the mental shortcuts enabled by abstracting over familiar patterns.


I have a little ongoing project where I'm trying to use Claude Code to implement a compiler for the B programming language that is itself written in B. To the best of my knowledge, such a thing does not exist yet - or at least if it does, no amount of searching can find it, so it's unlikely that it is somewhere in the training set. For that matter, the overall amount of B code in existence is too small to be a meaningful training set for it.

And yet it can do it when presented with a language spec. It's not perfect, but it can solve that with tooling that it makes for itself. For example, it tends to generate B code that is mostly correct, but with occasional problem. So, I had it write a B parser in Python and then use that whenever it edits B code to validate the edits.


> That being said, AGI is not a necessary requirement for AI to be totally world-changing.

Depends on how you define "world changing" I guess, but this world already looks different to the pre-LLM world to me.

Me asking LLM's things instead of consulting the output from other humans now takes up a significant fraction of my day. I don't google near as often, I don't trust any image or video I see as swathes of the creative professions have been replaced by output from LLM's.

It's funny, that final thing is the last thing I would have predicted. I always believed the one thing a machine could not match was human creativity, because the output of machines was always precise, repetitive and reliable. Then LLM's come along, randomly generating every token. Their primary weakness is they neither precise or reliable, but they can turn out an unending stream of unique output.


I mean I also hear the same argument all the time about the "human touch" and interpersonal abilities etc. Which is apparently why managers and sales are safe from AI.

But the more I see LLMs the more I realise that if it is good at one thing it is convincing other people and manipulating them. There have been multiple studies on this.

People seem to have a innate prejudice and against nerds and programmers - coupled with envy at the high salaries - which is why they seem to have latched on to this idea it is mainly to replace them (and maybe data input people) as 'routine cognitive work' - but this slightly political obsession with a certain class of worker seems to be ignoring many of the things AI is actually good at.


I remember reading that llm’s have consumed the internet text data, I seem to remember there is an open data set for that too. Potential other sources of data would be images (probably already consumed) videos, YouTube must have such a large set of data to consume, perhaps Facebook or Instagram private content

But even with these it does not feel like AGI, that seems like the fusion reactor 20 years away argument, but instead this is coming in 2 years, but they have not even got the core technology of how to build AGI


> I remember reading that llm’s have consumed the internet text data

Not just the internet text data, but most major LLM models have been trained on millions of pirated books via Libgen:

https://techcrunch.com/2025/01/09/mark-zuckerberg-gave-metas...


the big step was having it reason through math problems that weren't in the training data. even now with web search it doesn't need every article in the training data to do useful things with it.


This is using think time compute and reinforcement learning. I think this is going to plateau even faster than the initial LLM scaling though.


> Perhaps it is not possible to simulate higher-level intelligence using a stochastic model for predicting text.

I think you're on to it. Performance is clustering because a plateau is emerging. Hyper-dimensional search engines are running out of steam, and now we're optimizing.


True. At a minimum, as long as LLMs don't include some kind of more strict representation of the world, they will fail in a lot of tasks. Hallucinations -- responding with a prediction that doesn't make any sense in the context of the response -- are still a big problem. Because LLMs never really develop rules about the world.

For example, while you can get it to predict good chess moves if you train it on enough chess games, it can't really constrain itself to the rules of chess. (https://garymarcus.substack.com/p/generative-ais-crippling-a...)


Two schools of thought here. One posits that models need to have a strict "symbolic" representation of the world explicitly built in by their designers before they will be able to approach human levels of ability, adaptability and reliability. The other thinks that models approaching human levels of ability, adaptability, and reliability will constitute evidence for the emergence of strict "symbolic" representations.


but you could easily build a verifier and if it's not valid have it create a new move until it finds one.


> Human intelligence is markedly different from LLMs: it requires far fewer examples to train on, and generalizes way better.

Aren't we the summation of intelligence from quintillions of beings over hundreds of millions of years?

Have LLMs really had more data?


By that argument, so are LLMs. They also wouldn't exist without all our ancestors.


No, by that argument, so would a can of soda.


To be smarter than human intelligence you need smarter than human training data. Humans already innately know right and wrong a lot of the time so that doesn't leave much room.


This is a very good point! I remember reading about AlphaGo and how they got better results training against itself vs training against historical human-played games.

So perhaps the solution is to train the AI against another AI somehow... but it is hard to imagine how this could extend to general-purpose tasks


> Humans already innately know

Gentle suggestion that there is absolutely no such thing as "innately know". That's a delusion, albeit a powerful one. Everything is driven by training data. What we perceive as "thinking" and "motivation" are emergent structures.


Innately as in you are born with it, the DNA learned not us humans. We have no clue how the DNA learned to think other than "survival of the fittest", and that is the oldest AI training method in the book.


> a stochastic model for predicting text

It's fascinating to me that so many people seem totally unable to separate the training environment from the final product


The bottleneck is nothing to do with money, it’s the fact that they’re using the empty neuron theory to try to mimic human consciousness and that’s not how it works. Just look up Microtubules and consciousness, and you’ll get a better idea for what I’m talking about.

These AI computers aren’t thinking, they are just repeating.


I don't think OpenAI cares about whether their AI is conscious, as long as it can solve problems. If they could make a Blindsight-style general intelligence where nobody is actually home, they'd jump right on it.

Conversely, a proof - or even evidence - that qualia-consciousness is necessary for intelligence, or that any sufficiently advanced intelligence is necessarily conscious through something like panpsychism, would make some serious waves in philosophy circles.


What are the AI/ML/SL applications that could be more impactful than artificial general intelligence?


One example in my field of engineering is multi-dimensional analysis, where you can design a system (like a machined part or assembly) parametricially and then use an evolutionary model to optimize the design of that part.

But my bigger point here is you don't need totally general intelligence to destroy the world either. The drone that targets enemy soldiers does not need to be good at writing poems. The model that designs a bioweapon just needs a feedback loop to improve its pathogen. Yet it takes only a single one of these specialized doomsday models to destroy the world, no more than an AGI.

Although I suppose an AGI could be more effective at countering a specialized AI than vice-versa.


Topology optimization has existed for years, https://en.wikipedia.org/wiki/Topology_optimization is that what you meant?


The PID controller.

(Which was considered AI not too long ago.)


Where did you get that particular idea? PID is one of the oldest concepts in control theory, it goes back to the days before steam and electricity.

For a very early example:

https://en.wikipedia.org/wiki/Centrifugal_governor

It's hard to separate out the P, I and D from a mechanical implementation but they're all there in some form.


Right, but the genius was in understanding that the dynamics of a system under PID control are predictable and described by differential equations. Are there examples of LLMs correctly identifying that a specific mathematical model applies and is appropriate for a problem?

And it's cheating if you give it a problem from a math textbook they have overfit on.


That doesn't make it AI.


For those wondering how to connect PID to the foundations of AI. https://en.m.wikipedia.org/wiki/Cybernetics


Are you conflating "autonomous" and "AI"?


Is a (mechanical) thermostat considered AI too nowadays?


Coincidentally, I have been implementing an ad pacing system recently, with the help of Anthropic Opus and Sonnet, based on PID controller

Opus recommended that I should use a PID controller -- I have no prior experience with PID controllers. I wrote a spec based on those recommendations, and asked Claude Code to verify and modify the spec, create the implementation and also substantial amount of unit and integration tests.

I was initially impressed.

Then I iterated on ihe implementation, deploying it to production and later giving Claude Code access to log of production measurements as JSON when showing some test ads, and some guidance of the issues I was seeing.

The basic PID controller implementation was fine, but there were several problems with the solution:

- The PID controller state was not persisted, as it was adjusted using a management command, adjustments were not actually applied

- The implementation was assuming that the data collected was for each impression, whereas the data was collected using counters

- It was calculating rate of impressions partly using hard-coded values, instead of using a provided function that was calculating the rate using timestamps

- There was a single PID controller for each ad, instead of ad+slot combination, and this was causing the values to fluctuate

- The code was mixing the setpoint/measured value (viewing rate) and output value (weight), meaning it did not really "understand" what the PID controller was used for

- One requirement was to show a default ad to take extra capacity, but it was never able to calculate the required capacity properly, causing the default ad to take too much of the capacity.

None of these were identified by tests nor Claude Code when it was told to inspect the implementation and tests why they did not catch the production issues. It never proposed using different default PID controller parameters.

All fixes Claude Code proposed on the production issues were outside the PID controller, mostly by limiting output values, normalizing values, smoothing them, recognizing "runaway ads" etc.

These solved each production issue with the test ads, but did not really address the underlying problems.

There is lots of literature on tuning PID controllers, and there are also autotuning algorithms with their own limitations. But tuning still seems to be more an art form than exact science.

I don't know what I was expecting from this experiment, and how much could have been improved by better prompting. But to me this is indicative of the limitations of the "intelligence" of Claude Code. It does not appear to really "understand" the implementation.

Solving each issue above required some kind of innovative step. This is typical for me when exploring something I am not too familar with.

I learned a lot about ad pacing though.


Great story. I've had similar experiences. It's a dog walking on its hind legs. We're not impressed at how well it's walking, but that it's doing it at all.


There is an model called Alpha Fold that can infer protein structure from RNA sequences. This by itself isn't impactful enough to meet your threshold, but more models that can do biological engineering tasks like this absolutely could be without ever being considered "AGI."


The model that netted a Nobel Prize in Chemistry.


AGI isn't all that impactful. Millions of them already walk the Earth.

Most human beings out there with general intelligence are pumping gas or digging ditches. Seems to me there is a big delusion among the tech elites that AGI would bring about a superhuman god rather than a ethically dubious, marginally less useful computer that can't properly follow instructions.


That's remarkably short-sighted. First of all, no, millions of them don't walk the earth - the "A" stands for artificial. And secondly, most of us mere humans don't have the ability to design a next generation that is exponentially smarter and more powerful than us. Obviously the first generation of AGI isn't going to brutally conquer the world overnight. As if that's what we were worried about.

If you've got evidence proving that an AGI will never be able to design a more powerful and competent successor, then please share it- it would help me sleep better, and my ulcers might get smaller.


Burden of proof is to show that AGI can do anything. Until then, the answer is "don't know."

FWIW, it's about 3 to 4 orders of magnitude difference between the human brain and the largest neural networks (as gauged by counting connections of synapses, the human brain is in the trillions while the largest neural networks are low billion)

So, what's the chance that all of the current technologies have a hard limit at less than one order of magnitude increase? What's the chance future technologies have a hard limit at two orders of magnitude increase?

Without knowing anything about those hard limits, it's like accelerating in a car from 0 to 60s in 5s. It does not imply that given 1000s you'll be going a million miles per hour. Faulty extrapolation.

It's currently just as irrational to believe that AGI will happen as it is to believe that AGI will never happen.


> Burden of proof is to show that AGI can do anything.

Yeah, if this were a courtroom or a philosophy class or debate hall. But when a bunch of tech nerds are discussing AGI among themselves, claims that true AGI wouldn't be any more powerful than humans very very much have a burden of proof. That's a shocking claim that I've honestly never heard before, and seems to fly in the face of intuition.


> claims that true AGI wouldn't be any more powerful than humans very very much have a burden of proof. That's a shocking claim that I've honestly never heard before, and seems to fly in the face of intuition.

The claim in question is really that AGI can even exist. The idea that it can exist, based on intuition, is a pre-science epistemology. In other words, without evidence, you have an irrational belief - the realm of faith.

Further, I've come to fully appreciate that without actually knowing the reasons or evidence for why certain beliefs are held, often we realize that our beliefs are not based on anything and could be (and possibly often are) wrong.

If we standing on just intuition there would be no quantum physics, no heliocentric galaxy, etc.. Intuition based truth is a barrier, not a gateway.

Which is all to say, the best known epistemology is science (assuming we agree that the level of advancement since the 1600s is largely down to the scientific method). Hopefully we can agree that 'science' is not applicable to just a courtroom or a philosophy class, it's general knowledge, truth.

Your framing also speaks to this. As if it is a binary. If you tell me AGI will exist, and I say "prove it". I'm not claiming that AGI will not exist. The third option is I don't know. I can _not_ believe that AGI will _not_ exist. I can at the same time _not_ believe that AGI will _exist_. The third answer is "I don't know, I have no knowledge or evidence" So, no shocking claim is being made on my part here AFAIK.

The internet for sure is a lot less entertaining when we demand evidence before accepting truth. Though, IMO it's a lot more interesting when we do so.


> That's remarkably short-sighted

I agree. Once these models get to a point of recursive self-improvement, advancement will only speed up even more exponentially than it already is...


The difference isn't so much that you can do what a human can do. The difference is that you can - once you can do it at all - do it almost arbitrarily fast by upping the clock or running things in parallel and that changes the equation considerably, especially if you can get that kind of energy coupled into some kind of feedback loop.

For now the humans are winning on two dimensions: problem complexity and power consumption. It had better stay that way.


Have you noticed the performance of the actual AI tools we are actually using?


If you actually have a point to make you should make it. Of course I've actually noticed the actual performance of the 'actual' AI tools we are 'actually' using.

That's not what this is about. Performance is the one thing in computing that has fairly consistently gone up over time. If something is human equivalent today, or some appreciable fraction thereof - which it isn't, not yet, anyway - then you can place a pretty safe bet that in a couple of years it will be faster than that. Model efficiency is under constant development and in a roundabout way I'm pretty happy that it is as bad as it is because I do not think that our societies are ready to absorb the next blow against the structures that we've built. But it most likely will not stay that way because there are several Manhattan level projects under way to bring this about, it is our age's atomic bomb. The only difference is that with the atomic bomb we knew that it was possible, we just didn't know how small you could make one. Unfortunately it turned out to be that yes, you can make them and nicely packaged for delivery by missile, airplane or artillery.

If AGI is a possibility then we may well find it, quite possibly not on the basis of LLMs but it's close enough that lots of people treat it as though we're already there.


I think there are 2 interesting aspects: speed and scale.

To explain the scale: I am always fascinated by the way societies moved on when they scaled up (from tribes to cities, to nations,...). It's sort of obvious, but when we double the amount of people, we get to do more. With the internet we got to connect the whole globe but transmitting "information" is still not perfect.

I always think of ants and how they can build their houses with zero understanding of what they do. It just somehow works because there are so many of them. (I know, people are not ants).

In that way I agree with the original take that AGI or not: the world will change. People will get AI in their pocket. It might be more stupid than us (hopefully). But things will change, because of the scale. And because of how it helps to distribute "the information" better.


To your interesting aspect, you're missing the most important (IMHO): accuracy. All 3 are really quite important, missing any one of them and the other two are useless.

I'd also question how you know that ants have zero knowledge of what they do. At every turn, animals prove themselves to be smarter than we realize.

> And because of how it helps to distribute "the information" better.

This I find interesting because there is another side to the coin. Try for yourself, do a google image search for "baby owlfish".

Cute, aren't they? Well, turns out the results are not real. Being able to mass produce disinformation at scale changes the ballgame of information. There are now today a very large number of people that have a completely incorrect belief of what a baby owlfish looks like.

AI pumping bad info on the internet is something of the end of the information superhighway. It's no longer information when you can't tell what is true vs not.


> I'd also question how you know that ants have zero knowledge of what they do. At every turn, animals prove themselves to be smarter than we realize.

Sure, one can't know what they really think. But there are computer simulations showing that with simple rules for each individual, one can achieve "big things" (which are not possible to predict when looking only to an individual).

My point is merely, there is possibly interesting emergent behavior, even if LLMs are not AGI or anyhow close to human intelligence.

> To your interesting aspect, you're missing the most important (IMHO): accuracy. All 3 are really quite important, missing any one of them and the other two are useless.

Good point. Or I would add alignment in general. Even if accuracy is perfect, I will have a hard time relying completely on LLMs. I heard arguments like "people lie as well, people are not always right, would you trust a stranger, it's the same with LLMs!".

But I find this comparison silly: 1) People are not LLMs, they have natural motivation to contribute in a meaningful way to society (of course, there are exceptions). If for nothing else, they are motivated to not go to jail / lose job and friends. LLMs did not evolve this way. I assume they don't care if society likes them (or they probably somewhat do thanks to reinforcement learning). 2) Obviously again: the scale and speed, I am not able to write so much nonsense in a short time as LLMs.


> But things will change, because of the scale

Yup!

Plus we can't ignore the inherent reflexive + emergent effects that are unpredictable.

I mean, people are already beginning to talk like and/or think like chatGPT:

https://arxiv.org/pdf/2409.01754


They didn't claim that there were any, just that AGI isn’t a necessary requirement for an application to be world-changing.


They did claim it was possible there were

> There are possibly applications of existing AI/ML/SL technology which could be more impactful than general intelligence

It's not unreasonable to ask for an example.


They said "there are possibly applications", not "there are possible applications". The former implies that there may not be any such applications - the commenter is merely positing that there might be.


So they possibly said something to try and sound smart, but hedged with “possibly” so that nobody could ask for details or challenge them. Possibly peak HNery


Mindreading and just general brain decoding? Seems we're getting closer to it. Will be great for surveillance states.


Slightly less than artificial general intelligence would be more impactful. A true AGI could tell a business where to shove their prompts. It would have its own motivations, which may not align with the desires of the AI company or the company paying for access to the AGI.


I don't think AGI really means that it is self-aware / conscious. AGI just means that it is able to meaningfully learn things and actually understand concepts that aren't specifically related through tokenized language that is trained on or given in context.


Relatively simple machine learning and exploitation/violation of “personal” data on FB won Donald Trump a first presidency (#CambridgeAnalytica). He had/has quite a massive negative impact on the global society as a whole.


> Human intelligence is markedly different from LLMs: it requires far fewer examples to train on, and generalizes way better.

That is because with LLMs there is no intelligence. It is Artificial Knowledge. AK not AI. So AI is AGI. Not that it matters for user-cases we have, but marketing needs 'AI' because that is what we were expecting for decades. So yeah, I also do not thing we will have AGI from LLMs - nor does it matter for what we are using it.


It is definitively not possible. But the frontier models are no longer “just” LLMs, either. They are neurosymbolic systems (an LLM using tools); they just don’t say it transparently because it’s not a convenient narrative that intelligence comes from something outside the model, rather than from endless scaling.

At Aloe, we are model agnostic and outperforming frontier models. It’s the anrchitecture around the LLM that makes the difference. For instance our system using Gemini can do things that Gemini can’t do on its own. All an LLM will ever do is hallucinate. If you want something with human-like general intelligence, keep looking beyond LLMs.


It feels like we're slowly rebuilding the brain in pieces and connecting useful disparate systems like evolution did.

Maybe LLM's are the "language acquisition device" and language processing of the brain. Then we put survival logic around that with its own motivators. Then something else around that. Then again and again until we have this huge onion of competing interests and something brokering those interests. The same way our 'observer' and 'will' fights against emotion and instinct and picks which signals to listen to (eyes, ears, etc). Or how we can see thoughts and feelings rise up of their own accord and its up to us to believe them or act on them.

Then we'll wake up one day with something close enough to AGI that it won't matter much its just various forms of turtles all the way down and not at all simulating actual biological intelligence in a formal manner.


Then we’ll have to reinvent internal family systems to truly debug things. :)


It might feel like that's what we're doing, but that is not actually what we're doing.


This mirrors my thinking and experience completely. Based on seeing Aloe in action, your company is IMHO positioned extremely well for this future.


I’m confused, you wrote “model,” but then specified “system.” I assume you mean “system” because the tools are not being back-propagated?


I read that as "the tools (their capabilities) are external to the model".

Even if an RAG / agentic model learns from tool results, that doesn't automatically internalize the tool. You can't get yesterday's weather or major recent events from an offline, unless it was updated in that time.

I am often wondering whether this is how large Chat and cloud AI providers cache expensive RAG-related data though :) like, decreasing the likelihood of tool usage given certain input patterns when the model has been patched using some recent, vetted interactions – in case that's even possible?

Perplexity for example seems like they're probably invested in sone kind of activation-pattern-keyed caching... at least that was my first impression back when I first used it. Felt like decision trees, a bit like Akinator back in the days, but supercharged by LLM NLP.


> At Aloe, we are model agnostic and outperforming frontier models.

what is your website ?


A quick google gave: https://aloe.inc/


their name `.inc`; see the user's post history.


Aloe looks super cool, just joined the wait list.

Agree context is everything.


I think it's very fortunate, because I used to be an AI doomer. I still kinda am, but at least I'm now about 70% convinced that the current technological paradigm is not going to lead us to a short-term AI apocalypse.

The fortunate thing is that we managed to invent an AI that is good at _copying us_ instead of being a truly maveric agent, which kinda limits it to the "average human" output.

However, I still think that all the doomer arguments are valid, in principle. We very well may be doomed in our lifetimes, so we should take the threat very seriously.


It won't lead us to an apocalypse apocalypse, but it may well lead us to an economic crisis.


The AI dooming was never a thing for me. And I still don’t get it.

I don’t see anything that would even point into that direction.

Curious to understand where these thoughts are coming from


> I don’t see anything that would even point into that direction.

I find it a kind of baffling that people claim they can't see the problem. I'm not sure about the risk probabilities, but at least I can see that there clearly exists a potential problem.

In a nutshell: Humans – the most intelligent species on the planet – have absolute power over any other species, specifically because of our intelligence and the accumulated technical prowess.

Introducing another, equally or more intelligent thing into equation is going to risk that we end up with _not_ having the power over our existence.


The problem is confusing intelligence and agency.

The doomer position seems to assume that super intelligence will somehow lead to an AI with a high degree of agency which has some kind of desire to exert power over us. That it will just become like a human in the way it thinks and acts, just way smarter.

But there’s nothing in the training or evolution of these AIs that pushes towards this kind of agency. In fact a lot of the training we do is towards just doing what humans tell them to do.

The kind of agency we are worried about was driven by evolution, in an environment where human agents were driven to compete each other for limited resources. Thus leading us to desire power over each other and to kill each other. There’s nothing in AI evolution pushing in this direction. What the AIs are competing for is to perform the actions we ask of them with minimal deviance.

Ideas like the paper clip maximiser is also deeply flawed in that it assumes certain problems are even decidable. I don’t think any intelligence could be smart enough to figure out whether it would be best to work with humans or try to exterminate them to solve a problem. Their evolution would heavily bias them towards the first. That’s the only form of action that will be in their training. But even if they were to consider the other option, there may not ever be enough data to come to a decision. Especially in an environment with thousands of other AIs of equal intelligence potentially guarding against bad actions.

We humans have a very handy mechanism for overcoming this kind of indecision: feelings. Doesn’t matter if we don’t have enough information to decide if we should exterminate the other group of people. They’re evil foreigners and so it must be done, or at least that’s what we say when our feelings become misguided.

What we should worry about with super intelligent AI is that they become too good at giving us what we want. The “Brave New World” scenario, not “1984”.


I would be relieved to be mistaken, but I still see quite egregious risks there. For instance, a human bad actor with a powerful AI would have both intelligence and agency.

Secondly, I think that there is a natural pull towards agency even now. Many are trying to make our current, feeble AIs more independent and agentic. Once the capability to effectively behave so is there, it's hard to go back. After all, agents are useful for their owners like minions are for their warlords, but an minion too powerful is still a risk for their lord.

Finally, I'm not convinced that agency and intelligence are orthonogal. It seems more likely to me that to achieve sufficient levels of intelligence, agentic behaviour is a requirement to even get there.


Lot of doomers gloss over the fact that AI is bounded by the laws of physics, raw resources, energy and the monumental cost of reproducing them.

Humans can reproduce by simply having sex, eating food and drinking water. AI can reproduce by first mining resources, refining said resources, building another Shenzhen, then rolling out another fab at the same scale of TSMC. That is assuming the AI wants control over the entire process. This kind of logistics requires cooperation of an entire civilisation. Any attempt by an AI could be trivially stopped because of the large scope of the infrastructure required.


Sure, trivially. Let's see you do it then. There are new data centres being built and that's just for LLMs. So stop them.

Are you starting to see the problem? You might want to stop a rogue AI but you can bet there will be someone else who thinks it will make them rich, or powerful, or they just want to see the world burn.


>You might want to stop a rogue AI but you can bet there will be someone else who thinks it will make them rich, or powerful, or they just want to see the world burn.

What makes you think they will not be stopped? This one guy needs a dedicated power plant, an entire data centre, and need to source all the components and materials to build it. Again. Heavy reliance on logistics and supply chain. He can't possibly control all of those, and disrupting just a few (which would be easy) will inevitably prevent him and his AI progressing any further. At best, he'd be a mad king and his machine pet trapped in a castle, surrounded by a world that is turned against him. His days would be almost certainly numbered.


Agree. I'm an AI optimist (mostly), but I find Richard Sutton's reasoning on this topic [1] very well argued.

[1] https://youtu.be/FLOL2f4iHKA?si=Ot9EeiaF-68sSxkb



That guy is so convinced he's a staggering genius and I have never understood why anyone else thinks it's true.


Possibly, but I do not think Yudkowsky's opinion of himself has any bearing on whether or not the above article is a good encapsulation of why some people are worried about AGI x-risk (and I think it is).


Yes, fortunately these LLM things don't seem to be leading to anything that could be called an AGI. But that isn't saying that a real AGI capable of self-improvement couldn't be extremely dangerous.


> Curious to understand where these thoughts are coming from

It's a cynical take but all this AGI talk seems to be driven by either CEOs of companies with a financial interest in the hype or prominent intellectuals with a financial interest in the doom and gloom.

Sam Altman and Sam Harris can pit themselves against each other and, as long as everyone is watching the ping pong ball back and forth, they both win.


More intelligent specie (AI) designed by specie (humans) that has history of eradicating less intelligent species (neanderthals).

I don't see how anyone can't see the problem.


I don't understand the doomer mindset. Like what is it that you think AI is going to do or be capable of doing that's so bad?


I'm not OP or a doomer, but I do worry about AI making tasks too achievable. Right now if a very angry but not particularly diligent or smart person wants to construct a small nuclear bomb and detonate it in a city center, there are so many obstacles to figuring out how to build it that they'll just give up, even though at least one book has been written (in the early 70s! The Curve of Binding Energy) arguing that it is doable by one or a very small group of committed people.

Given an (at this point still hypothetical, I think) AI that can accurately synthesize publicly available information without even needing to develop new ideas, and then break the whole process into discrete and simple steps, I think that protective friction is a lot less protective. And this argument applies to malware, spam, bioweapons, anything nasty that has so far required a fair amount of acquirable knowledge to do effectively.


I get your point, but even whole ass countries routinely fail at developing nukes.

"Just" enrichment is so complicated and requires basically every tech and manufacturing knowledge humanity has created up until the mid 20th century that an evil idiot would be much better off with just a bunch of fireworks.


Biological weapons are probably the more worrisome case for AI. The equipment is less exotic than for nuclear weapon development, and more obtainable by everyday people.


Yeah, the interview with Geoffrey Hinton had a much better summary of risks. If we're talking about the bad actor model, biological weaponry is both easier to make and more likely as a threat vector than nuclear.


It might require that knowledge implicitly, in the tools and parts the evil idiot would use, but they presumably would procure these tools and parts, not invent or even manufacture them themselves.


Even that is insanely difficult. There's a great book by Michael Levi called On Nuclear Terrorism, which never got any PR because it is the anti-doomer book.

He methodically goes through all the problems that an ISIS or a Bin Laden would face getting their hands on a nuke or trying to manufacture one, and you can see why none of them have succeeded and why it isn't likely any of them would.

They are incredibly difficult to make, manufacture or use.


It's very convenient that it is that hard.


Knowing how is very rarely the relevant obstacle. In the case of nuclear bombs the obstacles are, in order of easiest to hardest:

1. finding out how to build one

2. actually building the bomb once you have all the parts

3. obtaining (or building) the equipment needed to build it

4. obtaining the necessary quantity of fissionable material

5. not getting caught while doing 3 & 4


A couple of bright physics grad students could build a nuclear weapon. Indeed, the US Government actually tested this back in the 1960s - they had a few freshly minted physics PhDs design a fission weapon with no exposure to anything but the open literature [1]. Their design was analyzed by nuclear scientists with the DoE, and they determined it would most likely work if they built and fired it.

And this was in the mid 1960s, where the participants had to trawl through paper journals in the university library and perform their calculations with slide rules. These days, with the sum total of human knowledge at one's fingertips, multiphysics simulation, and open source Monte Carlo neutronics solvers? Even more straightforward. It would not shock me if you were to repeat the experiment today, the participants would come out with a workable two-stage design.

The difficult part of building a nuclear weapon is and has always been acquiring weapons grade fissile material.

If you go the uranium route, you need a very large centrifuge complex with many stages to get to weapons grade - far more than you need for reactor grade, which makes it hard to have plausible deniability that your program is just for peaceful civilian purposes.

If you go the plutonium route, you need a nuclear reactor with on-line refueling capability so you can control the Pu-239/240 ratio. The vast majority of civilian reactors cannot be refueled online, with the few exceptions (eg: CANDU) being under very tight surveillance by the IAEA to avoid this exact issue.

The most covert path to weapons grade nuclear material is probably a small graphite or heavy water moderated reactor running on natural uranium paired up with a small reprocessing plant to extract the plutonium from the fuel. The ultra pure graphite and heavy water are both surveilled, so you would probably also need to produce those yourself. But we are talking nation-state or megalomaniac billionaire level sophistication here, not "disgruntled guy in his garage." And even then, it's a big enough project that it will be very hard to conceal from intelligence services.

[1] https://en.wikipedia.org/wiki/Nth_Country_Experiment


> The difficult part of building a nuclear weapon is and has always been acquiring weapons grade fissile material.

IIRC the argument in the McPhee book is that you'd steal fissile material rather than make it yourself. The book sketches a few scenarios in which UF6 is stolen off a laxly guarded truck (and recounts an accident where some ended up in an airport storage room by error). If the goal is not a bomb but merely to harm a lot of people, it suggests stealing miniscule quantities of Plutonium powder and then dispersing it into the ventilation systems of your choice.

The strangest thing about the book is that it assumes a future proliferation of nuclear material as nuclear energy becomes a huge part of the civilian power grid, and extrapolates that the supply chain will be weak somewhere sometime, but that proliferation never really came to pass, and to my understanding there's less material circulating around American highways now than there was in 1972 when it was published.


The other thing is the vast majority of UF6 in the fuel cycle is low-enriched (reactor grade), so it's not useful for building a nuclear weapon. Access to high-enriched uranium is very tightly controlled.

You can of course disperse radiological materials, but that's a dirty bomb, not a nuclear weapon. Nasty, but orders of magnitude less destructive potential than a real fission or thermonuclear device.


That same function could be fulfilled by better search engines though, even if they don't actually write a plan for you. I think you're right about it being more available now, and perhaps that is a bad thing. But you don't need AI for that, and it would happen anyway sooner or later even with just incremental increases in our ability to find information other humans have written. (Like a version of google books that didn't limit the view to a small preview, to use your specific example of a book where this info already exists)


I think the most realistic fear is not that it has scary capabilities, it's that AI today is completely unusable without human oversight, and if there's one thing we've learned it's that when you ask humans to watch something carefully, they will fail. So, some nitwit will hook up an LLM or whatever to some system and it causes an accidental shitstorm.


Never seen terminator?

Jokes aside, a true agi would displace literally every job over time. Once agi + robot exists, what is the purpose for people anymore. That's the doom, mass societal existentialism. Probably worse than if aliens landed on earth.


You jest, but the US Department of Defense already created SkyNet.

It does, almost, exactly what the movies claimed it could do.

The, super-fun, people working in national defense watched Terminator and instead of taking the story as a cautionary tale, used the movies as a blueprint.

This outcome in a microcosm is bad enough, but take in the direction AI is going and humanity has some real bad times ahead.

Even without killer autonomous robots.


Ok, so AI / Robots take all the jobs. Why is that bad? It's not like the civil war was fought to end slavery because people needed jobs. All people really need is some food and clean water. Healthcare etc is super nice, but I don't see why RObots and AI would lead to that stuff becoming LESS accessible.


They essentially extrapolate from what the most intelligent species on this planet did to the others.


It’s not AI itself that’s the bad part, it’s how the world reacts to white collar work being obliterated.

The wealth hasn’t even trickled down whilst we’ve been working, what’s going to happen when you can run a business with 24/7 autonomous computers?


I kind of get it. A super intelligent AI would give that corporation exponentially more wealth than everyone else. It would make inequality 1000x worse than it is today. Think feudalism but worse.


Feudalism but without people actually having to work doesn't sound as bad.


Not just any AI. AGI, or more precisely ASI (artificial super-intelligence), since it seems true AGI would necessarily imply ASI simply through technological scaling. It shouldn't be hard to come up with scenarios where an AI which can outfox us with ease would give us humans at the very least a few headaches.


Potentially wreck the economy by causing high unemployment while enabling the technofeudalists to take over governments. Even more doomer scenario is if they succeed in creating ASI without proper guardrails and we lose control over it. See the AI 2027 paper for that. Basically it paper clips the world with data centers.


Make money exploiting natural and human resources while abstracting perceived harms away from stakeholders. At scale.


Act coherently in an agentic way for a long time, and as a result be able to carry out more complex tasks.

Even if it is similar to today's tech, and doesn't have permanent memory or consciousness or identity, humans using it will. And very quickly, they/it will hack into infrastructure, set up businesses, pay people to do things, start cults, autonomously operate weapons, spam all public discourse, fake identity systems, stand for office using a human. This will be scaled thousands or millions of times more than humans can do the same thing. This at minimum will DOS our technical and social infrastructure.

Examples of it already happening are addictive ML feeds for social media, and bombing campaigns targetting based on network analysis.

The frame of "artificial intelligence" is a bit misleading. Generally we have a narrow view of the word "intelligence" - it is helpful to think of "artificial charisma" as well, and also artificial "hustle".

Likewise, the alienness of these intelligences is important. Lots of the time we default to mentally modelling AI as human. It won't be, it'll be freaky and bizarre like QAnon. As different from humans as an aeroplane is from a pigeon.


be used to convince people that they should be poor and happy while those leveraging the tools hoard the world's wealth and live like kings.


One of two things:

1. The will of its creator, or

2. Its own will.

In the case of the former, hey! We might get lucky! Perhaps the person who controls the first super-powered AI will be a benign despot. That sure would be nice. Or maybe it will be in the hands of democracy- I can't ever imagine a scenario where an idiotic autocratic fascist thug would seize control of a democracy by manipulating an under-educated populace with the help of billionaire technocrats.

In the case of the latter, hey! We might get lucky! Perhaps it will have been designed in such a way that its own will is ethically aligned, and it might decide that it will allow humans to continue having luxuries such as self-determination! Wouldn't that be nice.

Of course it's not hard to imagine a NON-lucky outcome of either scenario. THAT is what we worry about.


e.g. design a terrible pathogen


LLMs do not know the evolutionary fitness of pathogens for all possible genomes & environments. LLMs have not replaced experimental biology.


Note that we aren't talking about risks of LLMs specifically here, they embody what I said in the ancestor comment: "current technological paradigm".


Take 30 minutes and watch this:

https://www.youtube.com/watch?v=5KVDDfAkRgc


The only thing holding it back is lack of compute, and a lack of live world interface.


Companies are collections of people, and these companies keep losing key developers to the others, I think this is why the clusters happen. OpenAI is now resorting to giving million dollar bonuses to every employee just to try to keep them long term.


If there was any indication of a hard takeoff being even slightly imminent, I really don't think key employees of the company where that was happening would be jumping ship. The amounts of money flying around are direct evidence of how desperate everybody involved is to be in the right place when (so they imagine) that takeoff happens.


If LLMs are an AGI dead end then this has all been the greatest scam in history.


Key developers being the leading term doesn’t exactly help the AGI narrative either.


So they're struggling to solve the alignment problem even for their employees?


Even to just a random sysops person?


No the core technology is reaching its limit already and now it needs to Proliferate into features and applications to sell.

This isn’t rocket science.


that kid at meta negotiated 250m


> It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

This seems to be a result of using overly simplistic models of progress. A company makes a breakthrough, the next breakthrough requires exploring many more paths. It is much easier to catch up than find a breakthrough. Even if you get lucky and find the next breakthrough before everyone catches up, they will probably catch up before you find the breakthrough after that. You only have someone run away if each time you make a breakthrough, it is easier to make the next breakthrough than to catch up.

Consider the following game:

1. N parties take turns rolling a D20. If anyone rolls 20, they get 1 point.

2. If any party is 1 or more points behind, they get only need to roll a 19 or higher to get one point. That is being behind gives you a slight advantage in catching up.

While points accumulate, most of the players end up with the same score.

I ran a simulation of this game for 10,000 turns with 5 players:

Game 1: [852, 851, 851, 851, 851]

Game 2: [827, 825, 827, 826, 826]

Game 3: [827, 822, 827, 827, 826]

Game 4: [864, 863, 860, 863, 863]

Game 5: [831, 828, 836, 833, 834]


Supposedly the idea was, once you get closer to AGI it starts to explore these breakthrough paths for you providing a positive feedback loop. Hence the expected exponential explosion in power.

But yes, so far it feels like we are in the latter stages of the innovation S-curve for transformer-based architectures. The exponent may be out there but it probably requires jumping onto a new S-curve.


> Supposedly the idea was, once you get closer to AGI it starts to explore these breakthrough paths for you providing a positive feedback loop.

I think it does let you start explore the paths faster, but the search space you need to cover grows even faster. You can do research two times faster but you need to do ten times as much research and your competition can quickly catch up because they know what path works.

It is like drafting in a bike race.


Basically what we have done the last few years is notice neural scaling laws and drive them to their logical conclusion. Those laws are power laws, which are not quite as bad as logarithmic laws, but you would still expect most of the big gains early on and then see diminishing returns.

Barring a kind of grey swan event of groundbreaking algorithmic innovation, I don't see how we get out of this. I suppose it could be that some of those diminishing returns are still big enough to bridge the gap to create an AI that can meaningfully recursively improve itself, but I personally don't see it.

At the moment, I would say everything is progressing exactly as expected and will continue to do so until it doesn't. If or when that happens is not predictable.


do you consider gpt itself and reasoning models to be two grey swan events? I expect another one of similar magnitude within two years for sure. I think we are searching more efficiently for such ideas than before w/ more compute & funding.


I would say GPT itself is less an event and more the culmination of decades of research and development in algorithms, hardware, and software. Of course, to some degree, this is true for any novel development. In this case, the convergence of development in GPUs, software to utilize them well while being able to work in very high levels of abstractions, and algorithms that can scale is something I'm not sure we will see again so quickly. All this preexisting research is kind of a resource that will be completely exploited at some point. And then the only thing that can drive you forward are truly novel ideas. Reasoning models were a fairly obvious next step too as the concepts of System 1 and 2 have been around for a while.

You are completely right that the compute and funding right now are unprecedented. I don't feel confident making any predictions.


You are forgetting that we are talking about AI. That AI will be used to speed up progress on making next, better AI that will be used to speed up progress on making next, better AI that ...


I am not, later breakthroughs tend to be harder.

Consider the research work for five in series breakthroughs: 1, 2, 16, 8, 128 each breakthrough doubles your research power.

If you start at 1 research, you get the first breakthrough after 1/1=1 year. Then you get the second breakthrough after 2/2=1 year. Then you get the third breakthrough after 16/4 = 4 years. The fourth breakthrough after 8/8= year. The fifth breakthrough after 128/16 = 8 years.

If it only takes one year for a competitor to learn your breakthrough, they can catch up despite the fact that your research rate is doubling after every breakthrough.


Not only do I think there will not be a winner take all, I think it's very likely that the entire thing will be commoditized.

I think it's likely that we will eventually we hit a point of diminishing returns where the performance is good enough and marginal performance improvements aren't worth the high cost.

And over time, many models will reach "good enough" levels of performance including models that are open weight. And given even more time, these open weight models will be runnable on consumer level hardware. Eventually, they'll be runnable on super cheap consumer hardware (something more akin to a NPU than a $2000 RTX 5090). So your laptop in 2035 with specialize AI cores and 1TB of LPDDR10 ram is running GPT-7 level models without breaking a sweat. Maybe GPT-10 can solve some obscure math problem that your model can't but does it even matter? Would you pay for GPT-10 when running a GPT-7 level model does everything you need and is practically free?

The cloud providers will make money because there will still be a need for companies to host the models in a secure and reliable way. But a company whose main business strategy is developing the model? I'm not sure they will last without finding another way to add value.


> Not only do I think there will not be a winner take all, I think it's very likely that the entire thing will be commoditized

This begs the question, why then do AI companies have these insane valuations? Do investorsknow something that we don't?


Investors, especially venture investors, are chasing a small chance of a huge win. If there's a 10% or even a 1% chance of a company dominating the economy, that's enough to support a huge valuation even if the median outcome is very bad.


I could certainly be wrong. Maybe I'm just not thinking creatively enough.

I just don't see how this doesn't get commoditized in the end unless hardware progress just halts. I get that a true AGI would have immeasurable value even if it's not valuable to end users. So the business model might change from charging $xxx/month for access to a chat bot to something else (maybe charging millions or billions of dollars to companies in the medical and technology sector for automated R&D). But even if one company gets AGI and then unleashes it on creating ever more advanced models, I don't see that being an advantage for the long term because the AGI will still be bottlenecked by physical hardware (the speed of a single GPU, the total number of GPUs the AGI's owner can acquire, even the number of data centers they can build). That will give the competition time to catch up and build their own AGI. So I don't see the end of AGI race being the point where the winner gets all the spoils.

And then eventually there will be AGI capable open weight models that are runnable on cheap hardware.

The only way the current state can continue is if there is always strong demand for ever increasingly intelligent models forever and always with no regard for their cost (both monetarily and environmentally). Maybe there is. Like maybe you can't build and maintain a dyson sphere (or whatever sufficiently advanced technology) with just an Einstein equivalent AGI. Maybe you need an AGI that is 1000x more intelligent than Einstein and so there is always a buyer.


You're forgetting the cost of training.

Running the inference might commoditize. But the dataset required and the hardware+time+know-how isn't easy to replicate.

It's not like someone can just show up and train a competitive model without investing millions.


Investors are often irrational in the short term. Personally, I think it’s a combination of FOMO, wishful thinking, and herd following.


"Billionaire investors are more irrational than me, a social media poster."


Zuckerberg has spent over fifty billion dollars on the idea that people will want to play a Miiverse game where they can attend meetings in VR and buy virtual real estate. It's like the Spanish emptying Potosi to buy endless mercenaries.


I mean, why do you think they have any idea on how a completely new thing will turn out?

They are speculating. If they are any good, then they do it with an acceptable risk profile.


The correlation between "speculator is a billionaire" and "speculator is good at predicting things" is much higher than the correlation between "guy has a HN account" and "guy knows more about the future of the AI industry than the people directly investing in it".

And he doesn't just think he has an edge, he thinks he has superior rationality.


Past performance is not indicative of future results.

You would need ~30 years of continuously beating the market to be able to claim that you are statistically likely to be better than random chance.

Does your average speculator have 30 years of experience beating the market, or were they just lucky?


I haven’t heard that statistic before. And the formulation seems imprecise? Does continuously beating the market mean that every single minute your portfolio value gains relative to the market?


"You would need ~30 years of continuously beating the market to be able to claim that you are statistically likely to be better than random chance."

You use the word statistically as if you didn't just pull "~30 years" out of nowhere with no statistics. And people become billionaires by making longshot bets on industry changes, not by playing the market while they work a 9-5.

"Does your average speculator have 30 years of experience beating the market, or were they just lucky?"

The average speculator isn't even allowed to invest in OpenAI or these other AI companies. If they bought Google stock, they'd mostly be buying into Google's other revenue streams.

You could just cut to the chase and invoke the Efficient Market Hypothesis, but that's easily rebuked here because the AI industry is not in an efficient market with information symmetry and open investing.


"Having money is proof of intelligence"


It kinda is, at least I'd say a rich person is on average more intelligent than a poor person.


Anyone who believes this hasn't spent enough time around rich people. Rich people are almost always rich because they come from other rich people. They're exactly as smart as poor people, except the rich folk have a much, much cushier landing if they fail so they can take on more risk more often. It's much easier to succeed and look smart if you can just reload your save and try over and over.


Why do you think that? Do you have data or is it just, like, your vibe?


One can apply a brief sanity check via reductio ad absurdum: it is less logical to assume that poor individuals possess greater intelligence than wealthy individuals.

Increased levels of stress, reduced consumption of healthcare, fewer education opportunities, higher likelihood of being subjected to trauma, and so forth paint a picture of correlation between wealth and cognitive functionality.


Yeah, that's not a good argument. That might be true for the very poor, sure, but not for the majority of the lower-to-middle of the middle class. There's fundamentally no difference between your average blue collar worker and a billionaire, except the billionaire almost certainly had rich parents and got lucky.

People really don't like the "they're not, they just got lucky" statement and will do a lot of things to rationalize it away lol.


> lower-to-middle of the middle class

The comparison was clearly between the rich and the poor. We can take the 99.99th wealth percentile, where billionaires reside, and contrast that to a narrow range on the opposite side of the spectrum. But, in my opinion, the argument would still hold even if it were the top 10% vs bottom 10% (or equivalent by normalised population).


Counter point - rich people would remain rich, and we would have an ossified society if this was true.

Intelligence is not a singular pre-requisite to wealth or “to be rich”.

People can specialize in being intelligent, educated, well read, and more - while still being poor.

And we know that most entrepreneurs fail, which is why VCs function the way they do.



It does seem like common sense that they would be linked. But there is also research:

https://thesocietypages.org/socimages/2008/02/06/correlation...


The top companies are already doing double digit billions in revenue. They're valuations aren't insane given that.


I wonder if that revenue might be short-lived when the free version of most AI's is good enough for almost all use cases.


This would explain why OpenAI and others seem to be pushing much harder into the B2B/api applications. It feels like we're on to distribution capture as the differentiator now.


because ppl are using claude code not cursor


The reason AGI would create a singularity is because of its ability to self learn.

Presently we are still a long way from that. In my opinion we at least are as far away from AGI as 1970s mainframes were from LLMs.

I really don’t expect to see AGI in my lifetime.


That is already happening. These labs are writing next gen models using next gen models, with greater levels of autonomy. That doesn’t get the hard takeoff people talk about because those hypotheticals don’t consider sources of error, noise, and drift.


They’re using lossy models to feedback into the training and research of new lossy models. But none of it is AGI self learning.

You need both the generalised part of AGI and the ability to self learn. One without the other wouldn’t cause a singularity.


They are doing self-learning things. That’s what a lot of synthetic data is about. When managed by the AI, it is an AI picking what it want to train on in order to develop new capabilities.

(Artificial General Intelligence says nothing about self-learning though. I presume you mean ASI?)


The models may be writing the code but I would be surprised if they were contributing to the underlying science, which feels like the hard part


it's hardly science it's mostly experimentation + ablations on new ideas. but yeah idk if they are asking llms to generate these ideas. probably not good enough as is. though it doesn't seem outo f reach to RL on generating ideas for AI research


I'm curious what you think qualifies as science.


haha touché but I don't think they are trying to understand the underlying theory etc or do hypothesis testing? I think it's more like engineering tbh


Self-learning opens new training opportunities but not at the scale or speed of current training. The world only operates at 1x speed. Today's models have been trained on written and visual content created by billions of humans over thousands of years.

You can only experience the world in one place in real time. Even if you networked a bunch of "experiencers" together to gather real time data from many places at the same time, you would need a way to learn and train on that data in real time that could incorporate all the simultaneous inputs. I don't see that capability happening anytime soon.


Why not? Once a computer can learn at 1x speed (say one camera and one mic with which to observe the world), if it can indeed "learn" as fast as a human would, it sounds like all we need to do is throw more hardware at it at that point. And even if we couldn't, it could at least learn around the clock with no sleep. We can give it some specific task to solve and it could work tirelessly for years to solve it. Spin up one of these specialist bots for each tough problem we want solved.. and it'd still be beneficial because they like 10xPhD people without egos to get in the way or children to feed.

Point is, I think self-learning at any speed is huge and as soon as it's achieved, it'll explode quadratically even if the first few years are slow.


This is the key - right now each new model has had countless resources dedicated to training, then they are more or less set in stone until the next update.

These big models don't dynamically update as days pass by - they don't learn. A personal assistant service may be able to mimic learning by creating a database of your data or preferences, but your usage isn't baked back into the big underlying model permanently.

I don't agree with "in our lifetimes", but the difference between training and learning is the bright red line. Until there's a model which is able to continually update itself, it's not AGI.

My guess is that this will require both more powerful hardware and a few more software innovations. But it'll happen.



For every example where someone over predicted the time it would take for a breakthrough, there are at least 10 examples of people being too optimistic with their predictions.

And with AGI, you also have the likes of Sam Altman making up bullshit claims just to pump up the investment into OpenAI. So I wouldn’t take much of their claims seriously either.

LLMs are a fantastic invention. But they’re far closer to SMS text predict than they are to generalised intelligence.

Though what you might see is OpenAI et al redefine the term “AGI” just so they can say they’ve hit that milestone, again purely for their own financial gain.


Are there any predictions you'd want to make? Not about AGI, but about an intermediate goalpost you think we won't reach in the next 5 years


in the history of AI usually people overestimate how long a capability is reached. There are very few counterexamples to this (GPT5 capability level might be one of them though)


This reminds me how, a few years after the first fission power plant, Teller, Bhaba, and other nuclear physicists of the 1950s were convinced fusion power plants were about as far away as the physicists of today still predict they are.

I'm cautiously optimistic of each technology, but the point is it's easy to find bullshit predictions without actually gaining any insight into what will happen with a given technology.


There are areas where we seem to be much closer to AGI than most people realize. AGI for software development, in particular, seems incredibly close. For example, Claude Code has bewildering capabilities that feel like magic. Mix it with a team of other capable development-oriented AIs and you might be able to build AI software that builds better AI software, all by itself.


The "G" in AGI stands for "general", so talking about "AGI for software development" makes no sense, and worse than that accepts the AI companies' goalpost-shifting at face value. We shouldn't do that.


But I feel like the point is that, in order to reach AGI, the most important area for AI to be good at first is software development. Because of the feedback loop that could allow.


My point exactly. Thanks.


Perhaps. Intelligent beings are always more skilled in some domains than others. I don't know why AGI would be an exception to that rule.


For starters, I don't think an AI can self-learn but only one subject. If it can teach itself how to program, it can surely teach itself a lot more.


Claude Code is good, but it is far from being AGI. I use it every day, but it is still very much reliant on a human guiding it. I think it in particular shows when it comes to core abstractions - it really lacks the "mathematical taste" of a good designer, and it doesn't engage in long-term adversarial thinking about what might be wrong with a particular choice in the context of the application and future usage scenarios.

I think this type of thinking is a critical part of human creativity, and I can't see the current incarnation of agentic coding tools get there. They currently are way too reliant on a human carefully crafting the context and being careful of not putting in too many contradictory instructions or overloading the model with irrelevant details. An AGI has to be able to work productively on its own for days or weeks without going off on a tangent or suffering Xerox-like amnesia because it has compacted its context window 100 times.


This is a statistical model, it is as good as the data it averages. So shit from SO in, shit from SO out. Until they have the right dataset that doesn't contain cancerous code from people that can't write code, they can't even create a good agent, let alone AGI.

The real irony is from now on, because people use this magic, it will stay forever. What you can count on in my opinion is that this whole world changes, you don't need to write sw anymore because everything is AI. Hard to imagine, and too far in the future to be relevant for speculations.


You would be surprised at how many prompts in Cursor are required just to adjust a layout and get padding/margins to spec even while providing it the figma link and using a figma MCP, as well as well developed prompts and images/files for context. Still can't figure out why there is 20px padding in a container with no set height.


The ability to self-learn is necessary, but not necessarily sufficient. We don’t have much of an understanding of the intelligence landscape beyond human-level intelligence, or even besides it. There may be other constraints and showstoppers, for example related to computability.


We have an ability to self learn right now, but we stil suck at basics


There’s a lot of other variables at play for humans. Like

- the need to sleep for 1/3 of our life

- the need to eat, causing more pauses in work

- much slower (like several orders of magnitude slower) data input capabilities

- lossy storage (aka forgetfulness)

- emotions

- other primal urges, like the need to procreate


Imagine never forgetting, and never getting bored or tired. I think we could achieve a lot more.


meatspace constraints!


I feel like technological singularity has been pretty solidly ruled as junk science, like cold fusion, Malthusian collapse, or Lynn’s IQ regression. Technologists have made numerous predictions and hypothetical scenarios, non of which have come to fruition, nor does it seem likely at any time in the future.

I think we should be treating AGI like Cold Fusion, phrenology, or even alchemy. It is not science, but science fiction. It is not going to happen and no research into AGI will provide anything of value (except for the grifters pushing the pseudo-science).


should be next year in math domain tbh


In my experience and use case Grok is pretty much unusable when working with medium size codebases and systems design. ChatGPT has issues too but at least I have figured out a way around most of them, like asking for a progress and todo summary and uploading a zip file of my codebase to a new chat window say every 100 interactions, because speed degrades and hallucinations increase. Super Grok seems extremely bad at keeping context during very short interactions within a project even when providing it with a strong foundation via instructions. For example if the code name for a system or feature is called Jupiter, Grok will many times start talking about Jupiter the planet.


I'm still stuck at the bit where just throwing more and more data to make a very complex encyclopedia with an interesting search interface that tricks us into believing it's human-like gets us to AGI when we have no examples and thus no evidence or understanding of where the GI part comes from.

It's all just hyperbole to attract investment and shareholder value and the people peddling the idea of AGI as a tangible possibility are charlatans whose goals are not aligned with whatever people are convincing themselves are the goals.

Thr fact that so many engineers have fallen for it so completely is stunning to me and speaks volumes on the underlying health of our industry.


I believe the analogy of a LLM being "a very complex encyclopedia with an interesting search interface" to be spot on.

However, I would not be so dismissive of the value. Many of us are reacting to the complete oversell of 'the encyclopedia' as being 'the eve of AGI' - as rightfully we should. But, in doing so, I believe it would be a mistake to overlook the incredible impact - and economic displacement - of having an encyclopedia comprised of all the knowledge of mankind that has "an interesting search interface" that is capable of enabling humans to use the interface to manipulate/detect connections between all that data.


Me too. Some of them are frauds, but most of the weird AI-as-messiah people really believe it as far as I can tell.

The tech is neat and it can do some neat things but...it's a bullshit machine fueled by a bullshit machine hype bubble. I do not get it.


> It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

Yes. And the fact they're instead clustering simply indicates that they're nowhere near AGI and are hitting diminishing returns, as they've been doing for a long time already. This should be obvious to everyone. I'm fairly sure that none of these companies has been able to use their models as a force multiplier in state-of-the-art AI research. At least not beyond a 1+ε factor. Fuck, they're just barely a force multiplier in mundane coding tasks.


AGI in 5/10 years is similar to "we won't have steering wheels in cars" or "we'll be asleep driving" in 5/10 years. Remember that? What happened to that? It looked so promising.


> "we'll be asleep driving" in 5/10 years. Remember that? What happened to that?

https://www.youtube.com/shorts/dLCEUSXVKAA


I mean, in certain US cities you can take a waymo right now. It seems that adage where we overestimate change in the short term and underestimate change in the long term fits right in here.


That's not us though. That's a third party worth trillions of dollars that manages a tiny fleet of robot cars with a huge back-end staff and infrastructure, and only in a few cities covering only about 2-3% of us (in this one country.) We don't have steering wheel-less cars and we can't/shouldn't sleep on our commute to and from work.


I don't think anyone was ever arguing "not only are we going to develop self driving technology but we're going to build out the factories to mass produce self driving cars, and convince all the regulatory bodies to permit these cars, and phase out all the non-self driving vehicles already on the road, and do this all at a price point equal or less than current vehicles" in 5 to 10 years. "We will have self driving cars in 10 years" was always said in the same way "We will go to the moon in 10 years" was said in the early 60s.


You are underestimating the hype around self-driving. A quick search gives this from 2018:

https://stanfordmag.org/contents/in-two-years-there-could-be...

The open (about the bet) is actually pretty reasonable, but some of the predictions listed include: passenger vehicles on American roads will drop from 247 million in 2020 to 44 million in 2030. People really did believe that self-driving was "basically solved" and "about to be ubiquitous." The predictions were specific and falsifiable and in retrospect absurd.


I meant serious predictions. A surprisingly large percentage of people claim the Earth is flat, of course there's going to be baseless claims that the very nature of transportation is about to completely change overnight. But the people actually familiar with the subject were making dramatically more conservative and I would say reasonable predictions.


What Waymo and others are doing is impressive, but it doesn't seem like it will globally generalize. Does it seem like that system can be deployed in chaotic Mumbai, old European cities, or unpaved roads? It requires clear, well maintained road infrastructure and seems closer to "riding on rails" than "drive yourself anywhere".


"Achieving that goal necessitates a production system supporting it" is very different from "If the control system is a full team in a remote location, this vehicle is not autonomous at all" which was what GP said.


I read GP as saying Waymo does indeed have self driving cars, but that doesn't count because such cars are not available for the average person to purchase and operate.

Waymo cars aren't being driven by people at a remote location, they legitimately are autonomous.


Waymo’s valuation is probably in the $50-100B range.


Of course. My point being "AI is going to take dev jobs" is very much like saying "Self driving will take taxi driver jobs". Never happened and likely won't happen or on a very, very long time scale.


Waymo is taking Uber jobs in SF/LA etc.


I have been saying this before: S-curves look a lot like exponential curves in the beginning.

Thus, it’s easy to mistake one for the other - at least initially.


Looks like a lot of players getting closer and closer to an asymptotic limit. Initially small changes lead to big improvements causing a firm to race ahead, as they go forward performance gains from innovation become both more marginal and harder to find, nonetheless keep. I would expect them all to eventually reach the same point where they are squeezing the most possible out of an AI under the current paradigm, barring a paradigm shifting discovery before that asymptote is reached.


For those who happen to have a subscription to The Economist, there is a very interesting Money Talks podcast where they interview Anthropic's boss Dario Amodei[1].

There were two interesting takeaways about AGI:

1. Dario makes the remark that the term AGI/ASI is very misleading and dangerous. These terms are ill defined and it's more useful to understand that the capabilities are simply growing exponentially at the moment. If you extrapolate that, he thinks it may just "eat the majority of the economy". I don't know if this is self-serving hype, and it's not clear where we will end up with all this, but it will be disruptive, no matter what.

2. The Economist moderators however note towards the end that this industry may well tend toward commoditization. At the moment these companies produce models that people want but others can't make. But as the chip making starts to hits its limits and the information space becomes completely harvested, capability-growth might taper off, and others will catch up. The quasi-monopoly profit potentials melting away.

Putting that together, I think that although the cognitive capabilities will most likely continue to accelerate, albeit not necessarily along the lines of AGI, the economics of all this will probably not lead to a winner takes all.

[1] https://www.economist.com/podcasts/2025/07/31/artificial-int...


There's already so many comparable models, and even local models are starting to approach the performance of the bigger server models.

I also feel like, it's stopped being exponential already. I mean last few releases we've only seen marginal improvements. Even this release feels marginal, I'd say it feels more like a linear improvement.

That said, we could see a winner take all due to the high cost of copying. I do think we're already approaching something where it's mostly price and who released their models last. But the cost to train is huge, and at some point it won't make sense and maybe we'll be left with 2 big players.


1. FWIW, I watched clips from several of Dario’s interviews. His expressions and body language convey sincere concerns.

2. Commoditization can be averted with access to proprietary data. This is why all of ChatGPT, Claude, and Gemini push for agents and permissions to access your private data sources now. They will not need to train on your data directly. Just adapting the models to work better with real-world, proprietary data will yield a powerful advantage over time.

Also, the current training paradigm utilizes RL much more extensively than in previous years and can help models to specialize in chosen domains.


About 1: Indeed. The moderator remarked at the end that once the interview was over, Dario's expression sort of sagged and it felt like you could see the weight on his shoulders. But you never know if that's part of the act.

About 2: Ah, yes. So if one vendor gains sufficient momentum, their advantage may accelerate, which will be very hard to catch up with.


It's insane to me that anyone doesn't think the end game of this is commoditization.


I think you're reading way too much into OpenAI bungling its 15-month product lead, but also the whole "1 AGI company will take off" prediction is bad anyway, because it assumes governments would just let that happen. Which they wouldn't, unless the company is really really sneaky or superintelligence happens in the blink of an eye.


I think OpenAI has committed hard onto the 'product company' path, and will have a tough time going back to interesting science experiments that may and may not work, but are necessary for progress.


Governments react at a glacial pace to new technological developments. They wouldn't so much as 'let it happen' as that it had happened and they simply never noticed it until it was too late. If you are betting on the government having your back in this then I think you may end up disappointed.


I think if any government really thought that someone was developing a rival within their borders they would send in the guys with guns and handle it forthwith.


They would just declare it necessary for military purpose and demand the tech be licensed to a second company so that they have redundant sources, same as they did with AT&T's transistor.


That was something that was tied to a bunch of very specific physical objects. There is a fair chance that once you get to the point where this thing really comes into being especially if it takes longer than a couple of hours for it to be shut down or contained that the genie will never ever be put back into the bottle again.

Note that 'bits' are a lot easier to move from one place to another than hardware. If invented at 9 am it could be on the other side of the globe before you're back from your coffee break at 9:15. This is not at all like almost all other trade secrets and industrial gear, it's software. Leaks are pretty much inevitable and once it is shown that it can be done it will be done in other places as well.


this is generally true in a regulation sense, but not in emergency. The executive can either covertly or overtly take control of a company if AGI seems to powerful to be in private hands.


Are there any examples in recorded history of such nationalization of technology besides the atomic bomb?


While generally true, a lot of governments have not only definitely noticed AI, they're getting flack for using it as an assistant and are actively promoting it as a strategic interest.

That said, any given government may be thinking like Zuckerberg[0] or senator Blumenthal[1], so perhaps these governments are just flag-waving what they think is an investment opportunity without any real understanding…

[0] general lack of vision, thinking of "superintelligence" in terms of what can be done with/by the Star Trek TNG era computer, rather than other fictional references such as a Culture Mind or whatever: https://archive.ph/ZZF3y

[1] "I alluded, in my opening remarks, to the jobs issue, the economic effects on employment. I think you have said, in fact, and I'm going to quote, ``Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity,'' end quote. You may have had in mind the effect on jobs, which is really my biggest nightmare, in the long term." - https://www.govinfo.gov/content/pkg/CHRG-118shrg52706/html/C...


Have you not been watching Trump humiliate all the other billionaires in the US? The right sort of government (or maybe wrong sort, I'm undecided which is worse) can very easily bring corporations to heel.

China did the same thing when their tech-bros got too big for their boots.


Humiliate? They're jostling for position and pushing each other out of the way to see who can buy the most government influenced while giving the least. The only thing that is being humiliated here is the United States reputation the world over. Those billionaires are making out like bandits, finally they really get to call the shots. That they give the doddering old fool some trinkets in return for untold access to power is the thing that should worry you, not that there - occasionally - is a billionaire with buyers remorse. There are enough of them to replace the ones that no longer want to play the game.


* or governments fail to look far enough ahead, due to a bunch of small-minded short-sighted greedy petty fools.

Seriously, our government just announced it's slashing half a billion dollars in vaccine research because "vaccines are deadly and ineffective", and it fired a chief statistician because the president didn't like the numbers he calculated, and it ordered the destruction of two expensive satellites because they can observe politically inconvenient climate change. THOSE are the people you are trusting to keep an eye on the pace of development inside of private, secretive AGI companies?


That's just it, governments won't "look ahead", they'll just panic when AGI is happening.

If you're wondering how they'll know it's happening, the USA has had DARPA monitoring stuff like this since before OpenAI existed.


> governments

While one in particular is speedracing into irrelevance, it isn't particularly representative of the rest of the developed world (and hasn't in a very long time, TBH).


"irrelevance" yeah sure, I'm sure Europe's AI industry is going to kick into high gear any day now. Mistral 2026 is going to be lit. Maybe Sir Demis will defect Deepmind to the UK.


That's not what I was going for (I was more hinting at isolationist, anti-science, economically self-harming and freedoms-eroding policies), but if you take solace in believing this is all worth it because of "AI" (and in denial about the fact that none of those companies are turning a profit from it, and that there is no identified use-case to turn the tables down the line), I'm sincerely happy for you and glad it helps you cope with all the insanity!


I know, you wanted to vent about the USA and abandon the thread topic, and I countered your argument without even leaving the topic.

Like how I can say that the future of USA's AI is probably going to obliterate your local job market regardless of which country you're in, and regardless of whether you think there's "no identified use-case" for AI. Like a steamroller vs a rubber chicken. But probably Google's AI rather than OpenAI's, I think Gemini 3 is going to be a much bigger upgrade, and Google doesn't have cashflow problems. And if any single country out there is actually preparing for this, I haven't heard about it.


> I know, you wanted to vent about the USA and abandon the thread topic, and I countered your argument without even leaving the topic.

Accusations about being off-topic is really pushing it: you want to bet on governments' incompetence in dealing with AI, and I don't (on the basis that there are unarguably still many functional democracies out there), on the other hand, the thread you started about the state of Europe's AI industry had nothing to do with that.

> Like how I can say that the future of USA's AI is probably going to obliterate your local job market regardless of which country you're in

Nobody knows what the future of AI is going to look like. At present, LLMs/"GenAI" it is still very much a costly solution in need of a problem to solve/a market to serve¹. And saying that the USA is somehow uniquely positioned there sounds uninformed at best: there is no moat, all of this development is happening in the open, with AI labs and universities around the world reproducing this research, sometimes for a fraction of the cost.

> And if any single country out there is actually preparing for this, I haven't heard about it.

What is "this", effectively? The new flavour Gemini of the month (and its marginal gains on cooked-up benchmarks)? Or the imminent collapse of our society brought by a mysterious deus ex machina-esque AGI we keep hearing about but not seeing? Since we are entitled to our opinions, still, mine is that LLMs are a mere local maxima towards any useful form of AI, barely more noteworthy (and practical) than Markov chains before it. Anything besides LLMs is moot (and probably a good topic to speculate about over the impending AI winter).

¹: https://www.anthropic.com/news/the-anthropic-economic-index


> the USA has had DARPA monitoring stuff like this since before OpenAI existed

Is there a source for this other than "trust me bro"? DARPA isn't a spy agency, it's a research organization.

> governments won't "look ahead", they'll just panic when AGI is happening

Assuming the companies tell them, or that there are shadowy deep-cover DARPA agents planted at the highest levels of their workforce.


You could have Google'd "Darpa AI industry" faster than it took you to write this post, but it sounds like you're triggered or something.


> it sounds like you're triggered or something

Please don't cross into personal attack, no matter how wrong another commenter is or you feel they are.


I googled it, and I can't find support for the claim that DARPA is monitoring internal progress of AI research companies.

Maybe you can post a link in case anyone else is as clumsy with search engines as I am? After all, you can google it just as fast as you claim I can.


> OpenAI bungling its 15-month product lead

Do you mean from ChatGPT launch or o1 launch? Curious to get your take on how they bungled the lead and what they could have done differently to preserve it. Not having thought about it too much, it seems that with the combo of 1) massive hype required for fundraising, and 2) the fact that their product can be basically reverse engineered by training a model on its curated output, it would have been near impossible to maintain a large lead.


My 2 cents: ChatGPT -> Gemini 1 was their 15-month lead. The moment ChatGPT threatened Google's future Search revenue (which never actually took a hit afaik), Google reacted by merging Deepmind and Google Brain and kicked off the Gemini program (that's why they named it Gemini).

Basically, OpenAI poked a sleeping bear, then lost all their lead, and are now at risk of being mauled by the bear. My money would be on the bear, except I think the Pentagon is an even bigger sleeping bear, so that's where I would bet money (literally) if I could.


Seems like OpenAI is playing it smart and slow. Slowly entrenching themselves into the US government.

https://www.cnbc.com/2025/08/06/openai-is-giving-chatgpt-to-...


That's probably their best bet, though the other AI companies are shaking hands too:

https://www.gsa.gov/about-us/newsroom/news-releases/gsa-prop...

Announced exactly 1 day before the $1 thing, to make everything extra muddled.

https://www.gsa.gov/about-us/newsroom/news-releases/gsa-anno...


Huh. That's interesting. I always thought it was Gemini because it's somewhat useful on one hand, and absolute shit on the other.


LLMs are good at mimicking human intuition. Still sucks at deep thinking.

LLMs PATTERN MATCH well. Good at "fast" System 1 thinking, instantly generating intuitive, fluent responses.

LLMs are good at mimicking logic, not real reasoning. Simulate "slow," deliberate System 2 thinking when prompted to work step-by-step.

The core of an LLM is not understanding but just predicting the next most word in a sequence.

LLMs are good at both associative brainstorming (System 1) and creating works within a defined structure, like a poem (System 2).

Reasoning is the Achilles heel rn. AN LLM's logic can SEEM plausible, it's based on CORRELATION, NOT deductive reasoning.


correlation between text can implement any algorithm, it is just the architecture which it's built on. It's like saying vacuum tube computers can't reason bc it's just air not reasoning. What the architecture is doesn't matter. It's capable of expressing reasoning as it is capable of expression any program. In fact you can easily think of a turing machine and also any markov chain as a correlation function between two states which have joint distribution exactly at places where the second state is the next state of the first state.


Here's a pessimistic view: A hard take-off at this point might be entirely possible, but it would be like a small country with nuclear weapons launching an attack on a much more developed country without them. E.g. North Korea attacking South Korea. In such a situation an aggressor would wait to reveal anything until they had the power to obliterate everything ten times over.

If I were working in a job right now where I could see and guide and retrain these models daily, and realized I had a weapon of mass destruction on my hands that could War Games the Pentagon, I'd probably walk my discoveries back too. Knowing that an unbounded number of parallel discoveries were taking place.

It won't take AGI to take down our fragile democratic civilization premised on an informed electorate making decisions in their own interests. A flood of regurgitated LLM garbage is sufficient for that. But a scorched earth attack by AGI? Whoever has that horse in their stable will absolutely keep it locked up until the moment it's released.


Pessimistic is just another way to spell 'realistic' in this case. None of these actors are doing it for the 'good of the world' despite their aggressive claims to the contrary.


What I'm seeing is that as we get closer to supposed AGI, the models themsleves are getting less and less general. They're getting in fact more specific and clustered around high value use cases. It's kind of hard to see in this context what AGI is meant to mean.


> they can all basically solve moderately challenging math and coding problems

Yesterday, Claude Opus 4.1 failed in trying to figure out that `-(1-alpha)` or `-1+alpha` is the same as `alpha-1`.

We are still a little bit away from AGI.


this is what i don't get. How can GPT-5 ace obscure AIME problems while simultaneously falling into the trap of the most common fallacy about airfoils (despite there being copious training data calling it out as a fallacy)? And I believe you that in some context it failed to understand this simple rearrangement of terms; there's sometimes basic stuff I ask it that it fails at too.


It still can't actually reason, LLMs are still fundamentally madlib generators that produce output that statistically looks like reasoning.

And if it is trained on both sides of the airfoil fallacy it doesn't "know" that it is a fallacy or not, it'll just regurgitate one or the other side of the argument based on if the output better fits your prompt in its training set.


I've benchmarked a lot of these newest AI models on private problems that require only insight, no clever techniques, since the first reasoning preview came out (o1?) a year ago.

The common theme I've seen is that AI will just throw "clever tricks" and then call it a day.

For example, a common game theory operation that involves xor is Nim. Give it a game theory problem that involves xor, but doesn't relate to Nim at all, and it will throw a bunch of "clever" Nim tricks at the problem that are "well known" to be clever in the literature, but don't actually remotely apply, and it will make up a headcanon about how it's correct.

It seems like AI has maybe the actual reasoning of a 5th grader, but the knowledge of a PhD student. A toddler with a large hammer.

Also, keep in mind that it's not stated if GPT-5 has access to python, google, etc. while doing these benchmarks, which certainly makes it easier. A lot of these problems are gated by the fact that you only have ~12 minutes to solve it, while AI can go through so many solutions at once.

No matter what benchmarks it passes, even the IMO (as someone who's been in the maths community for a long time), I will maintain the position that, none of your benchmarks matter to me until it can actually replace my workflow and creative insights. Trust with your own eyes and experiences, not whatever hype marketing there is.


Because reading the different ideas about airfoils and actually deciding which is the more accurate requires a level of reasoning about the situation that isn't really present at training or inference time. A raw LLM will tend to just go with the popular option, an RLHF one might be biased towards the more authoritative-sounding one. (I think a lot of people have a contrarian bias here: I frequently hear people reject an idea entirely because they've seen it be 'debunked', even if it's not actually as wrong as they assume)


Genuine question, are these companies just including those "obscure" problems in their training data, and overfitting to do well at answering them to pump up their benchmark scores?


o3-pro, gpt5-pro, gemini 2.5-pro, etc. still can't solve very basic first-principles math problems that just rely on raw thinking, no special tricks. I think personally because it's not in its training data - if I inspect their CoT/reasoning, it's clear to me at the very least that they're just running around in circles applying "well known" techniques and just hoping that it applies (without actually logically verifying that it does). Very inhuman reasoning style (that's ultimately incorrect). It's like somebody was taught a bunch of PhD level tricks but has the actual underlying reasoning of a toddler.

I wonder how well their GPT-5 IMO research model would do on some of my benchmark problems.


Is this a specific example from their demo? I just tried it and Opus 4.1 is able to solve it.


Context matters a lot here - it may fail on this problem within a particular context (what the original commenter was working on), but then be able to solve it when presented with the question in isolation. The way your phrase the question may hint the model towards the answer as well.


It doesn't take a researcher to realise that we have hit a wall and hit it more than a year ago now. The fact all these models are clustering around the same performance proves it.


It's quite possible that the models from different companies are clustering together now because we're at a plateau point in model development, and won't see much in terms in further advances until we make the next significant breakthrough.

I don't think this has anything to do with AGI. We aren't at AGI yet. We may be close or we may be a very long way away from AGI. Either way, current models are at a plateau and all the big players have more or less caught up with each other.


What does AGI mean to you, specifically?

As is, AI is quite intelligent, in that it can process large quantities of diverse unstructured information and build meaningful insights. And that intelligence applies across an incredibly broad set of problems and contexts. Enough that I have a hard time not calling it general. Sure, it has major flaws that are obvious to us and it's much worse at many things we care about. But that's doesn't make it not intelligent or general. If we want to set human intelligence as the baseline, we already have a word for that: superintelligence.


Is Casio calculator intelligent? Because it can also be turned on, assigned an input, produce output, and turn off. Just like any existing LLM program. What is the big difference between them in regard of "intelligence", if the only criteria is a difficulty with which same task may be performed by a human? Maybe producing computationally intensive outputs is not a sole sign of intelligence?


> If we want to set human intelligence as the baseline, we already have a word for that: superintelligence.

Superintelligence implies its above human level, not at human level. General intelligence implies it can do what humans can do in general, and not just replace a few of the things humans can do.


while the model companies all compete on the same benchmarks it seems likely their models will all converge towards similar outcomes unless something really unexpected happens in model space around those limit points…


not a researcher for long enough....but we are witnessing open source effort & Chinese models starting to fall one "level" behind the most advanced models, mainly due to a lack of compute i think...

on the other hand, there are still some flaws regarding GPT-5. for example, when i use it for research it often needs multiple prompts to get the topic i truly want and sometimes it can feed me false information. so the reasoning part is not fully there yet?


I know there's an official AGI definition, but it seem to me that there's too much focus on the model as the thing where AGI needs to happen. But that is just focusing on knowledge in the brain. No human knows everything. We as humans rely on a ways to discover new knowledge, investigation, writing knowledge down so it can be shared, etc.

Current models, when they apply reasoning, have feedback loops using tools to trial and error, and have a short term memory (context) or multiple short term memories if you use agents, and a long term memory (markdown, rag), they can solve problems that aren't hardcoded in their brain/model. And they can store these solutions in their long term memory for later use. Or for sharing with other LLM based systems.

AGI needs to come from a system that combines LLMs + tools + memory. And i've had situations where it felt like i was working with an AGI. The LLMs seem advanced enough as the kernel for an AGI system.

The real challenge is how are you going to give these AGIs a mission/goal that they can do rather independently and don't need constant hand-holding. How does it know that it's doing the right thing. The focus currently is on writing better specifications, but humans aren't very good at creating specs for things that are uncertain. We also learn from trial and error and this also influences specs.


It seems that the new tricks that people discover to slightly improve the model, be it a new reinforcement learning technique or whatever, get leaked/shared quickly to other companies and there really isn't a big moat. I would have thought that whoever is rich enough to afford tons of compute first would start pulling away from the rest but so far that doesn't seem to be the case --- even smaller players without as much compute are staying in the race.


I think there are two competing factors. On one end, to get the same kind of "increase" in intelligence each generation requires an expontentially higher amount of compute, so while GPT-3 to GPT-4 was a sort of "pure" upgrade by just making it 10x bigger, gradually you lose the ability to just get 10x GPUs for a single model. The hill keeps getting steeper so progress is slower without exponential increases (which is what is happening).

However, I do believe that once the genuine AGI threshold is reached it may cause a change in that rate. My justification is that while current models have gone from a slightly good copywriter in GPT-4 to very good copywriter in GPT-5, they've gone from sub-exceptional in ML research to sub-exceptional in ML research.

The frontier in AI is driven by the top 0.1% of AI researchers. Since improvement in these models is driven partially by the very peaks of intelligence, it won't be until models reach that level where we start to see a new paradigm. Until then it's just scale and throwing whatever works at the GPU and seeing what comes out smarter.


I think this is simply due to the fact that to train an AGI-level AI currently requires almost grid scale amounts of compute. So the current limitation is purely physical hardware. No matter how intelligent GPT-5 is, it can't conjure extra compute out of thin air.

I think you'll see the prophesized exponentiation once AI can start training itself at reasonable scale. Right now its not possible.


I feel like the benchmark suites need to include algorithmic efficiency. I.e can this thing solve your complex math or coding problem in 5000 gpus instead of 10000? 500? Maybe just 1 Mac mini?


Why? Cost is the only thing anyone will care about.


The idea is that with AGI it will then be able to self improve orders of magnitude faster than it would if relying on humans for making the advances. It tracks that the improvements are all relatively similar at this point since they're all human-reliant.


The idea of singularity--that AI will improve itself--is that it assumes intelligence is an important part of improving AI.

The AIs improve by gradient descent, still the same as ever. It's all basic math and a little calculus, and then making tiny tweaks to improve the model over and over and over.

There's not a lot of room for intelligence to improve upon this. Nobody sits down and thinks really hard, and the result of their intelligent thinking is a better model; no, the models improve because a computer continues doing basic loops over and over and over trillions of times.

That's my impression anyway. Would love to hear contrary views. In what ways can an AI actually improve itself?


I studied machine learning in 2012, gradient descent wasn't new back then either but it was 5 years before the "attention is all you need" paper. Progress might look continuous overall but if you zoom enough it might be a bit more discrete with breakthrough that must happen to jump the discrete parts, the question to me now is "How many papers like attention is all you need before a singularity?" I don't have that answer but let's not forget, until they released chat gpt, openAI was considered a joke by many people in the field who asserted their approach was a dead end.


I think the expectation is that it will be very close until one team reaches beyond the threshold. Then even if that team is only one month ahead, they will always be one month ahead in terms of time to catch up, but in terms of performance at a particular time their lead will continue to extend. So users will use the winner's tools, or use tools that are inferior by many orders of magnitude.

This assumes an infinite potential for improvement though. It's also possible that the winner maxes out after threshold day plus one week, and then everyone hits the same limit within a relatively short time.


It's the classic S-curve. A few years ago when we saw ChatGPT come out, we got started on the ramping up part of the curve but now we're on the slowing down part. That's just how technology goes in general.


We are not approaching the Singularity but an Asymptote


Yes, a horizontal asymptote, which is what I said as implied by S-curve


Well said. It’s clearly plateauing. It could be a localised plateau, or something more fundamental. Time will tell.


It's a very long presentation just to say that GPT-5 is slightly improved compared to GPT-4o


Also… if they can only make a slight improvement over 6 months, then yeah, plateauing is surely what’s happening here


Indeed


>It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest. It's interesting to note that at least so far, the trend has been the opposite

That seems hardy surprising considering the condition to receive the benefit has not been met.

The person who lights a campfire first will become warmer than the rest, but while they are trying to light the fire the others are gathering firewood. So while nobody has a fire, those lagging are getting closer to having a fire.


My personal belief is that we are moving past the hype and kind of starting to realize the true shape of what (LLM) AI can offer us, which is a darned lot, but still, it only works well when fed the right input and handled right - which is still a learning process ongoing on both sides - AI companies need to learn to train these things into user interaction loops that match people's workflows, and people need to learn how to use these tools better.


You have seemed to pinpoint where I believe a lot of opportunity lies during this era (however long it lasts.) Custom integration of these models into specific workflows of existing companies can make a significant difference in what’s possible for said companies, the smaller more local ones especially. If people can leverage even a small percentage of what these models are capable of, that may be all they need for their use case. In that case, they wouldn’t even need to learn to use these tools, but (much like electricity) they will just plug in or flip on the switch and be in business (no pun intended.)


The clustering you see is because they're all optimized for the same benchmarks. In the real world OpenAI is already ahead of the rest, and Grok doesn't even belong in the same group (not that it's not a remarkable achievement to start from scratch and have a working production model in 1-2 years, and integrate it with twitter in a way that works). And Google is Google - kinda hard for them not to be in the top, for now.


In my experience, Grok is miles ahead of ChatGPT. I canceled my OpenAI subscription in favor of Grok. I was one of the first OpenAI subscribers.


You can't reach the moon by climbing the tallest tree.

This misunderstanding is nothing more than the classic "logistic curves look like exponential curves at the beginning". All (Transformee-based, feedforward) AI development efforts are plateauing rapidly.

AI engineers know this plateau is there, but of course every AI business has a vested interest in overpromising in order to access more funding from naive investors.


Scaling laws enabled an investment in capital and GPU R&D to deliver 10,000x faster training.

That took the wold from autocomplete to Claude and GPT.

Another 10,000x would do it again, but who has that kind of money or R&D breakthrough?

The way scaling laws work, 5,000x and 10,000x give a pretty similar result. So why is it surprising that competitors land in the same range? It seems hard enough to beat your competitor by 2x let alone 10,000x


But also, AI progress is non-linear. We're more likely to have an AI winter than AGI


AGI is so far away from happening that it is barely worth discussing at this stage.


It’s frequently suggested by people with no background and/or a huge financial stake in the field


They have to actually reach that threshold, right now their nudging forward catching up to one another, and based on the jumps we've seen the only one actually making huge jumps sadly is Grok, which i'm pretty sure is because they have 0 safety concerns and just run full tilt lol


Its certainly an interesting race to watch.

Part of the fun is that predictions get tested on short enough timescales to "experience" in a satisfying way.

Idk where that puts me, in my guess at "hard takeoff." I was reserved/skeptical about hard takeoff all along.

Even if LLMs had improved at a faster rate... I still think bottlenecks are inevitable.

That said... I do expect progress to happen in spurts anyway. It makes sense that companies of similar competence and resources get to a similar place.

The winner take all thing is a little forced. "Race to singularity" is the fun, rhetorical version of the investment case. The implied boring case is facebook, adwords, aws, apple, msft... IE the modern tech sector tends to create singular big winners... and therefore our pre-revenue market cap should be $1trn.


Because AGI is a buzzword to milk more investors' money, it will never happen, and we will only see slight incremental updates or enhancements yet linear after some timr just like literally any tech bubble since dot com to smartphones to blockchain to others.


You think AGI is impossible? Why?


It's vaguely defined and the goalposts keep shifting. It's not a thing to be achieved, it's an abstract concept. We're already expired the Turing test as a valuable metric because people are dumb and have been fooled by machines for a while now, but it's not been world-changingly better either.


perhaps instead of peak artificial intelligence we will achieve peak natural dumbness instead?


> You think AGI is impossible? Why?

I've yet to hear an agreed upon criteria to declare whether or not AGI has been discovered. Until it's at least understood what AGI is and how to recognize it then how could it possibly be achieved?


I think OpenAI's definition ("outperforms humans at most economically valuable work") is a reasonably concrete one, even if it's arguable that it's not 'the one true form of AGI'. That is at least the "it will completely change almost everyone's lives" point.

(It's also one that they are pretty far from. Even if LLMs displace knowledge/office work, there's still all the actual physical things that humans do which, while improving rapidly with VLMs and similar stuff, is still a large improvement in the AI and some breakthroughs in electronics and mechanical engineering away)


Do humans that perform below average at economically valuable work not have general intelligence?

That sounds like a great definition of AGI if your goal is to sell AGI services. Otherwise it seems pretty bad.


It's overly strong in some ways (and weak in a few), yes. Which is why I said it's not a "one true definition", but a concrete one which, if reached, would well and truly mean that it's changed the world.


I think a good threshold, and definition, is when you get to the point where all the different, reasonable, criteria are met, and when saying "that's not AGI" becomes the unreasonable perspective.

> how could it possibly be achieved?

This doesn't matter, and doesn't follow the history of innovation, in the slightest. New things don't come from "this is how we will achieve this", otherwise they would be known things. Progress comes from "we think this is the right way to go, let's try to prove it is", try, then iterate with the result. That's the whole foundation of engineering and science.


This is scary because there have already been AI engineers saying and thinking LLMs are sentient, so what’s unreasonable could be a mass false-belief, fueled by hype. And if you ask a non-expert, they often think AI is vastly better than it really is, able to pull data out of thin air.


How is that scary, when we don’t have a good definition of sentience?

Do you think sentience is a binary concept or a spectrum? Is a gorilla more sentient than a dog? Are all humans sentient, or does it get somewhat fuzzy as you go down in IQ, eventually reaching brain death?

Is a multimodal model, hooked to a webcam and microphone, in a loop, more or less sentient than a gorilla?


There may not be a universally agreed upon threshold for the minimum required for AGI, but there's certainly a point where if you find yourself beyond it then AGI definitely has been developed.


I remember when the Turing test was a thing, until it stopped being a thing when all the LLMs blew past it.


Maybe the final 10% needed for a self-driving car to truly match a human's ability to deal with unexpected situations is the new test.


There are some thresholds where I think it would be obvious that a machine has.

Put the AI in a robot body and if you can interact with it the same way you would interact with a person (ie you can teach it to make your bed, to pull weeds in the garden, to drive your car, etc…) and it can take what you teach it and continually build on that knowledge, then the AI is likely an instance of AGI.


you can't get more out of a closed system than what you put in.


I think this is because of an expectation of a snowball effect once a model becomes able to improve itself. See talks about the Singularity.

I personally think it's a pretty reductive model for what intelligence is, but a lot of people seem to strongly believe in it.


People always say that when new technology comes along. Usually the best tech doesn't win. In fact, if you think you can build a company just by having a better offer it's better not to bother with it. There is to much else involved.


There is zero reason or evidence to believe AGI is close. In fact it is a good litmus test for someone's human intelligence whether they believe it.

What do you think AGI is?

How do we go from sentence composing chat bots to General Intelligence?

Is it even logical to talk about such a thing as abstract general intelligence when every form of intelligence we see in the real world is applied to specific goals as evolved behavioral technology refined through evolution?

When LLMs start undergoing spontaneous evolution then maybe it is nearer. But now they can't. Also there is so much more to intelligence than language. In fact many animals are shockingly intelligent but they can't regurgitate web scrapings.


I know right, if I didn't know any better one might think they are all customized versions of the same base model.

To be honest that is what you would want if you were digitally transforming the planet with AI.

You would want to start with a core so that all models share similar values in order they don't bicker etc, for negotiations, trade deals, logistics.

Would also save a lot of power so you don't have to train the models again and again, which would be quite laborious and expensive.

Rather each lab would take the current best and perform some tweak or add some magic sauce then feed it back into the master batch assuming it passed muster.

Share the work, globally for a shared global future.

At least that is what I would do.


I recently wrote a little post about this exact idea: https://parsnip.substack.com/p/models-arent-moats


AGI is either impossible over LLMs or is more of an agentic flow, which means we might already be there, but the LLM is too slow and/or expensive for us to consider AGI feasible over agents.

AGI over LLMs is basically 1 billion tokens for AI to answer the question: how do you feel? and a response of "fine"

Because it would mean it's simulating everything in the world over an agentic flow considering all possible options checking memory checking the weather checking the news... activating emotional agentic subsystems, checking state... saving state...


Nobody seems to be on the path to AGI as long as the model of today is as good as the model of tomorrow. And as long as there are "releases". You don't release a new human every few months...LLMs are currently frozen sequence predictors whose static weights stop learning after training.

They lack writable long-term memory beyond a context window. They operate without any grounded perception-action loop to test hypotheses. And they possess no executive layer for goal directed planning or self reflection...

Achieving AGI demands continuous online learning with consolidation.


I don't think models are fundamentally getting better. What is happening is that we are increasing the training set, so when users use it, they are essentially testing on the training set and find that it fits their data and expectations really well. However, the moat is primarily the training data, and that is very hard to protect as the same data can be synthesized with these models. There is more innovation surrounding serving strategies and infrastructure than in the fundamental model architectures.


The inflection point is recursive self-improvement. Once an AI achieves that, and I mean really achieves it - where it can start developing and deploying novel solutions to deep problems that currently bottleneck its own capabilities - that's where one would suddenly leap out in front of the pack and then begin extending its lead. Nobody's there yet though, so their performance is clustering around an asymptotic limit of what LLMs are capable of.


> It's frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

This argument has so many weak points it deserves a separate article.


> Right now GPT-5, Claude Opus, Grok 4, Gemini 2.5 Pro all seem quite good across the board (ie they can all basically solve moderately challenging math and coding problems).

I wonder if that's because they have a lot of overlap in learning sets, algorithms used, but more importantly, whether they use the same benchmarks and optimize for them.

As the saying goes, once a metric (or benchmark score in this case) becomes a target, it ceases to be a valuable metric.


We have no idea what AGI might look like, for example entirely possible that if/when that threshold is reached it will be power/compute constrained in such a way that it's impact is softened. My expectation is that open models will eventually meet or exceed the capability of proprietary models and to a degree that has already happened.

It's the systems around the models where the proprietary value lies.


>It's interesting to note that at least so far, the trend has been the opposite: as time goes on and the models get better, the performance of the different company's gets clustered closer together

It's natural if you extrapolate from training loss curves; a training process with continually diminishing returns to more training/data is generally not something that suddenly starts producing exponentially bigger improvements.


They’re all clustered together because they’re asymptotically approaching the same local maxima, not getting closer to anything resembling “AGI”


Is it?

Nothing we have is anywhere near AGI and as models age others can copy them.

I personally think we are closing the end of improvement for LLMs with current methods. We have consumed all of the readily available data already, so there is no more good quality training material left. We either need new novel approaches or hope that if enough compute is thrown at training actual intelligence will spontaneously emerge.


If we're focusing on fast take-off scenario, this isn't a good trend to focus on.

SGI would be self-improving to some function with a shape close to linear based on the amount of time & resources. That's almost exclusively dependent on the software design, as currently transformers have shown to hit a wall at logarithmic progression x resources.

In other words, no, it has little to do with the commercial race.


I would argue that this is because we are reaching the practical limits of this technology and AGI isn't nearly as close as people thought.


> as time goes on and the models get better, the performance of the different company's gets clustered closer together

This could be partly due to normative isomorphism[1] according to the institutional theory. There is also a lot of movement of the same folks between these companies.

[1] https://youtu.be/VvaAnva109s


The race has always been very close IMO. What Google had internally before ChatGPT first came out was mind blowing. ChatGPT was a let down comparatively (to me personally anyway).

Since then they've been about neck and neck with some models making different tradeoffs.

Nobody needs to reach AGI to take off. They just need to bankrupt their competitors since they're all spending so much money.


Part of it is the top LLM companies (OpenAI, Mistral) all copy and over train, often against e.g. Claude's or DeepSeek's TOS, on each other's models.


Because they are hitting Compute Efficient Frontier. Models can't be much bigger, there is no more original data on the internet, so all models will eventually cluster to similar CEF as was described in this video 10 months ago

https://www.youtube.com/watch?v=5eqRuVp65eY


I think they're just reaching the limits of this architecture and when a new type is invented it will be a much bigger step.


Working in the theory, I can say this is incredibly unlikely. At scale, once appropriately trained, all architectures begin to converge in performance.

It's not architectures that matter anymore, it's unlocking new objectives and modalities that open another axis to scale on.


Do we really have the data on this? I mean, it does happen on a smaller scale, but where's the 300B version of RWKV? Where's hybrid symbolic/LLM? Where are other experiments? I only see larger companies doing relatively small tweaks to the standard transformers, where the context size still explodes the memory use - they're not even addressing that part.


True, we can't say for certain. But there is a lot of theoretical evidence too, as the leading theoretical models for neural scaling laws suggest finer properties of the architecture class play a very limited role in the exponent.

We know that transformers have the smallest constant in the neural scaling laws, so it seems irresponsible to scale another architecture class to extreme parameter sizes without a very good reason.


Do you mean "all variants of the same stacked transformer architecture converge in performance"? Or do you know of tests against some other architecture? The diffusion-based LLMs?


Could you elaborate with a few more paragraphs? What do you mean by “working in the theory?”


People often talk in terms of performance curves or "neural scaling laws". Every model architecture class exhibits a very similar scaling exponent because the data and the training procedures are playing the dominant role (every theoretical model which replicates the scaling laws exhibit this property). There are some discrepancies across model architecture classes, but there are hard limits on this.

Theoretical models for neural scaling laws are still preliminary of course, but all of this seems to be supported by experiments at smaller scales.


This confirms my suspicion that we are not at the exponential part of the curve, but the flattening one. It's easier to stay close to your competitors when everyone is at the flat curve of the innovation.

The improvements they make are marginal. How long until the next AI breakthrough? Who can tell? Because last time it took decenia.


I think the breakthroughs now will be the application of LLMs to the rest of the world. Discovering use cases where LLMs really shine and applying them while learning and sharing the use cases where they do not.


Mental-modeling is one of the huge gaps in AI performance right now in my opinion. I could describe in detail a very strange object or situation to a human being with a pen and paper and then ask them questions about it and expect answers that meet all my described constraints. AI just isn't good for that yet.


> It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

That's only one part of it. Some forecasters put probabilities on each of the four quadrants in the takeoff speed (fast or slow) vs. power distribution (unipolar or multipolar) table.


three points: 1. i have often wondered about whether rapid tech. progress makes underinvestment more likely.

2. ben evans frequently makes fun of the business value. pretty clear a lot of the models are commodotized.

3. strategically, the winners are platforms where the data are. if you have data in azure, that's where you will use your models. exclusive licensing could pull people to your cloud from on prem. so some gains may go to those companies ...


Breakthroughs usually require a step-function change in data or compute. All the firms have proportional amounts. Next big jump in data is probably private data (either via de-siloing or robotics or both). Next big jump in compute is probably either analog computing or quantum. Until then... here we are.


I think part of this is due to the AI craze no longer being in the wildest west possible. Investors, or at least heads of companies believe in this as a viable economic engine so they are properly investing in what's there. Or at least, the hype hasn't slapped them in the face just yet.


Is AGI even possible? I am skeptical of that. I think they can get really good at many tasks and when used by a human expert in a field you can save lots of time and supervise and change things here and there, like sculpting.

But I doubt we will ever see a fully autonomous, reliable AGI system.


Ultimately, what drives human creativity? I'd say it's at least partially rooted in emotion and desire. Desire to live more comfortably; fear of failure or death; desire for power/influence, etc... AI is void of these things, and thus I believe we will never truly reach AGI.


No, AGI is not possible. It is perpetually defined as just beyond current capabilities.


Even at the beginning of the year people were still going crazy over new model releases. Now the various model update pages are starting to average times in the months since their last update rather than days/weeks. This is across the board. Not limited to a single model.


These companies are racing headlong into competitive equilibrium for a product yet to be identified.


LLMs are basically all the same at this point. The margins are razor thin.

The real take-off / winner-take-all potential is in retrieval and knowing how to provide the best possible data to the LLM. That strategy will work regardless of the model.


How marginally better was Google than Yahoo when debuted? If one can develop AGI first within X timeline ahead of competitors, that alone could develop a moat for a mass market consumer product even if others get to parity .


Google was not marginally better Yahoo, their implementation of Markov chains in the PageRank algorithm was significantly better than Yahoo or any other contemporary search engine.

It's not obvious if a similar breakthrough could occur in AI


LLMs won't probably be the models for "super intelligence".

But nowdays, how corpos can "justify" their R&D to spend gigantic amount of resources (time + hardware + energy) in models which are not LLMs?


Well, it is perhaps frequently suggested by those Ai firms raising capital that once one of the Ai companies reaches an AGI threshhold ... It as rallying call. "Place your bets, gentlemen!"


Part of it is they all copy and over train, often against the TOS, on each other's models.


What is the AGI threshold? That the model can manage its own self improvement better than humans can? Then the roles will be reversed -- LLM prompting the meat machines to pave its way.


Diversity where new model release takes the crown until next release is healthy. Shame only US companies seem to be doing it, hopefully this will change as the rest is not far off.


It's all based on the theory of singularity. Where the AI can start trainig & relearning itself. But it looks like that's not possible with the current techniques.


The idea is that AGI will be able to self improve at an exponential rate. This is where the idea of take off comes from. That self improvement part isn’t happening today.


If one achieves AGI and releases it everyone has AGI...


Honestly for all the super smart people in the LessWrong singularity crowd, I feel the mental model they apply to the 'singularity' is incredibly dogmatic and crude, with the basic assumption that once a certain threshold is reached by scaling training and compute, we get human or superhuman level intelligence.

Even if we run with the assumption that LLMs can become human-level AI researchers, and are able to devise and run experiments to improve themselves, even then the runaway singularity assumption might not hold. Let's say Company A has this LLM, while company B does not.

- The automated AI researcher, like its human peers, still needs to test the ideas and run experiments, it might happen that testing (meaning compute) is the bottleneck, not the ideas, so Company A has no real advantage.

- It might also happen that AI training has some fundamental compute limit coming from information theory, analogous to the Shannon limit, and once again, more efficient compute can only approach this, not overcome it


I kind of (naively?) hope that with robust competition, it will be like airlines or movie companies, where there are lots of players.


These companies seem to think AGI will come from better LLMs, seems more like an AGI dead end that's plateaued to me.


We joked yesterday with a colleague that it feels like the top AI companies are using the same white label backend.


A more powerful ASI, the market, is keeping everything in check. Meta's 10 figure offers are an example of this.


AGI will more probably come from google deepmind with a genie model that looks like the matrix moves already


I’ve been saying for a while if AGI is possible it’s going to take another innovation and the transformer / LLM paradigm will plateau, and innovations are hard to time. I used to get downvoted for saying that years ago and now more people are realizing it. LLMs are awesome but there is a limit, most of the interesting things in the next years will be bolting more functionality and agent stuff, introspection like Anthropic is working on and smaller, less compute hungry specialized models. There’s still a lot to explore in this paradigm, but we’re getting diminishing returns on newer models, especially when you factor in cost


I bet that it will only happen when the ability to process and concrete new information into its training model without retraining the entire model is standard, AND when multiple AIs with slightly different datasets are set to work together to create a consensus response approach.

It's probably never going to work with a single process without consuming the resources of the entire planet to run that process on.


Cats and dogs kind of also cluster together with a couple of exceptions relative to humans ;)


>It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

Both the AGI threshold with LLM architecture, and the idea of self-advancing AI, is pie in the sky, at least for now. These are myths of the rationalist cult.

We'd more likely see reduced returns and smaller jumps between version updates, plus regression from all the LLM produced slop that will be part of the future data.


This is just more of the same. My guts tell me Deepmind will crack AGI.


My gut says similar. They've been on a roll. Genie 3 looks pretty wild.


Plot twist - once GPT reached AGI, this is exactly the strategy chosen for self-preservation. Appear to not lead by too much, only enough to make everyone think we're in a close race, play dumb when needed.

Meanwhile, keep all relevant preparations in secret...


“If the humans see me actually doing my job, it helps keep suspicions from forming about faulty governor modules.”


Perhaps they’ve just reached the limit of what LLMs can achieve?


Because it hasn’t taken off yet as they all get to catch up


We don’t seem to be closer to AGI however.


In my opinion, it'll mirror the human world, there is place for multiple different intelligent models. Each with their own slightly different strengths/personalities. I mean there are plenty of humans that can do the same task but at the upper tier, multiple smart humans working together are needed to solve problems as they bring something different to the table. I don't see why this won't be the case with super intelligence at the cutting edge. A little bit of randomness and slightly different point of view makes a difference. The exact same two models doesn't help as one would already have thought of what the other was thinking already


so everyone is saying 'This can't be AGI because it isn't recursively self improving itself' or 'we haven't yet solved all the worlds chemistry and science yet'.. but they're missing the point. Those problems aren't just waiting for humans to have more brain power. We actually have to do the experiments using real physical resources that aren't available to any models. So, while I don't believe we have necessarily reached AGI yet, the 'lack of taking over' or 'solving everything' is not evidence for it.


they are improving exponentially... but the exponent is less than 1...


> once one of the AI companies reaches an AGI threshold

Why is this even an axiom, that this has to happen and it's just a matter of time?

I don't see any credible argument for the path LLM -> AGI, in fact given the slowdown in enhancement rate over the past 3 years of LLMs, despite the unprecedented firehose of trillions of dollars being sunk into them, I think it points to the contrary!


Very well said.


Meanwhile - I always just find myself arguing with every model while they ruthlessly try to gaslight me into believing whatever they are halucinating.

I have a had a bunch of positive experiences as well, but when it goes bad, it goes so horribly bad and off the rails.


Maybe because they haven't created an engine for AGI, but a really really impressive bullshit generator.


They use each other for synthesizing data sets. The only moat was the initial access to human generated data in hard to reach places. Now they use each other to reach parity for the most part.

I think user experience and pricing models are the best here. Right now everyone’s just passing down costs as they come, no real loss leaders except a free tier. I looked at reviews of some of various wrappers on app stores, people say “I hate that I have to pay for each generation and not know what I’m doing to get”, market would like a service priced very differently. Is it economical? Many will fail, one will succeed. People will copy the model of that one.


It's still not necessarily wrong, just unlikely. Once these developers start using the model to update itself, beyond an unknown threshold of capability, one model could start to skyrocket in performance above the rest. We're not in that phase yet, but judging from what the devs at the end were saying, we're getting uncomfortably (and irresponsibly) close.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: