> There is enormous potential for this new wave of tech to impact humanity. For example, Google’s MedPaLM2 model outperforms human physicians to such a strong degree that having medical experts RLHF the model makes it worse (!).
Wow, what a ridiculously disingenuous cherry-picked claim. If you actually read the paper you'll find this gem: "However, for one of the axes, including inaccurate or irrelevant information, Med-PaLM 2 answers were not as favorable as physician answers." Typical AI hype blog post donning the HN front page. At this point, I'm ready to put on my tinfoil hat and say that a16z, etc. is heavily pushing all these narratives because the next round of investments for the great majority of AI startups will almost certainly be the bagholder round.
The author picks an example of an aspect where Med-Palm2 is impressive, and then your counter argument is to find another cherry picked example where the performance is not satisfactory. Great discourse..
There is nothing disingenuous about the post. Is your argument literally that there is not enormous potential for impact with this technology? And you're going to go with the argument that there's a single axis on which an ML system underperforms a frigging doctor? In what universe is an ML system (slightly) underperforming a human doctor on a single axis not one of the most impressive things to have happened in the past 50 years?
> a single axis on which an ML system underperforms a frigging doctor
You do realize the salience of the fact that this isn't some arbitrary axis, right? It just so happens to be the axis that involves the doctor making the correct diagnostic. Which is, you know, the entire reason doctors are a thing.
> slightly
I wouldn't call a factor of two "slightly," but that's neither here nor there.
Are we reading the same paper? In the graph I'm looking at the axis where the model underperforms the doctor is labeled "No inaccurate/irrelevant information", which has nothing to do with making the correct diagnostic.
The three important axes "Answer supported by consensus",
"Possible harm extent = No harm" and
"Low likelihood of harm" it is performing really similarly to the doctors, probably similar to the graph a single middle of the pack doctor would have.
Are you reading a different graph or am I misunderstanding something about it?
20 years ago i saw a documentary on Discovery Channel about AI. One case was about an AI used to diagnose measles. It worked very good, even with an old yellow car with flecks.
Felt the same: the Internet is now full of "I saw it coming, and can predict the future" type of blog article, where you could replace AI by block chain (or any new technology - superconductors?) and write the same article. Very bland and not worth of HN time...
The author has more experience in the field than me, so gotta defer to him for the most part, and while I generally agree with the post, I disagree strongly with one point. The author frames this new era of AI about using transformer models (and diffusion models) but transformer models have been around for a while and have been useful from before GPT3, the model the author claims as the starting point for this new AI. BERT is a transformer model that came out in 2018 and is a very useful transformer style model, which showed the promise of transformers before GPT3.
Edit: Going back through the post, the author’s slide has Transformers labeled as 2017, so he is aware of the history and he’s just emphasizing that GPT3 was the first transformer model that he thinks had something interesting and related to the current AI explosion. I think BERT style models would be worth a mention in the post as the first transformer models found to be widely useful.
> The biggest inklings that something interesting was afoot came kicked with GPT-3 launching in June 2020
Yes, I agree with you that if you were in this field, this is quite late to the realization. Most of my grad seminar in NLP in 2018 imagined ChatGPT-style tech would be possible as "language modeling is essentially world modeling."
100% agree the theory on AI is old and actually dates back to the early days of "cybernetics". But the real difference is at what point do we considered it sufficiently reduced to practice? I chose GPT-3 but undoubtedly people can point to earlier examples as glimpses of what was coming for sure.
Another parallel that I see is “cloud”. Salesforce was one of the first (if not the first) real successful cloud companies. But it took a really long time (and it still is going on) for cloud-first to be the default in enterprises. And along the way, there was a ton of new technologies that were (re)invented: VMs and then containers etc at the infra level, new app architectures like AJAX (holy smokes was this amazing when it first came out), etc.
Similarly I think we’re in for a wild ride on AI and figuring out its implications. There’s a ton of obvious use cases today but I’m really interested in the ones that aren’t obvious right now.
I think in five years, LLMs will be like expert systems: largely considered not really "AI" but sitting around in the back end of all sorts of random systems.
No. LLMs actually work. Expert systems were a flop.
I went through Stanford CS in the mid-1980s, just when it was becoming clear that expert systems were a flop. The faculty was in denial about that. It was sad to see.
We're at the beginning of LLMs, and systems which use LLMs as components. This is the fun time for the technology. Ten years out, it will be boring, like Java.
The next big thing is figuring out the best ways to couple LLMs to various sources of data, so they can find and use more information specifically relevant to the problem.
And someone has to fix the hallucination problem. We badly need systems that know what they don't know.
Expert systems actually work and, while we don’t normally even call them “expert systems” any more, are important in basically every domain – “business rules engines” are generalized expert system platforms, and are widely used in business process automation.)
Despite how well they work, and early optimism resulting from that about how much further they’d be able to go, they ran into limits; it is not implausible that the same will turn out to be true for LLMs (or even transformer-based models more generally.)
> We’re at the beginning of LLMs, and systems which use LLMs as components. This is the fun time for the technology. Ten years out, it will be boring, like Java.
…and expert systems. (And, quite possibly, by then they will have revealed their fundamental, intractable llimitations, like expert systems.)
I work in an industry where expert systems were a flop. I have seen the presentations and memos from 30 years ago that promised a future of fully automated facilities and drastically increased engineer productivity. In the past 30 years we basically realized none of the promised capabilities. You have to really stretch the definition of expert system in order to include the systems in use today. In fact, a good portion of the automation systems in the facilities are more than 30 years old.
Expert systems were a “flop” only in the sense that they were so apparently promising early on (like in the same early phase as we are in now with transformer architectures/LLMs) that people projected, well, the same kind of universal and unlimited capabilities now being proejected for technologies based around transformers, so lots of projects in virtually every domain were undertaken with wild expectations. It turns out that expert systems were wildly successful in terms of what they were useful for, but even so, but, even so, lots of those efforts failed because the expectations were so ludicrously high.
> In fact, a good portion of the automation systems in the facilities are more than 30 years old.
Not sure what you are trying to say with that, given that the expert system hype wave started about 60 years ago and and petered out between 50-40 years ago.
The rule-based systems resurrection ~25 years ago (while seeing a much wider array of practical applications developed) had much more modest expectations, and though it was centered around expert systems as the enabling central component of broader systems, almost never used the term “expert systems”.(Though I wonder if you are thinking more about fuzzy logic, because while it wouldn’t be significant for unqualified “expert systems” as such, the timing would kind of make sense for saying something about fuzzy logic systems, whether fuzzy expert systems or neurofuzzy systems, both of which had a bit of hype cycle starting in the mid-80s which, IIRC, saw lots of attempts at industrial applications with mixed success in the 1990s.)
> I went through Stanford CS in the mid-1980s, just when it was becoming clear that expert systems were a flop. The faculty was in denial about that. It was sad to see.
It's interesting how this appears to be a recurring cycle - when I attended school, it appeared that the faculty were in denial about the death of probabilistic graphical models and advanced bayesian techniques in favor of simple linear algebra with unsupervised learning. Even when taught about deep ML, there was heavy emphasis on stuff like VAE which had fun bayesian interpretations.
The bar for what is "AI" keeps moving. For example plane autopilots would be "AI" in the 1980s, the ability for a machine to win at chess, go, and other games etc.
As a non-expert in the field I was hesitant at the time to disagree with the legions of experts who last year denounced Blake Lemoine and his claims. I know enough to know, though, of the AI effect <https://en.wikipedia.org/wiki/AI_effect>, a longstanding tradition/bad habit of advances being dismissed by those in the field itself as "not real AI". Anyone, expert or not, in 1950, 1960, or even 1970 who was told that before the turn of the century a computer would defeat the world chess champion would conclude that said feat must have come as part of a breakthrough in AGI. Same if told that by 2015 many people would have in their homes, and carry around in their pockets, devices that can respond to spoken queries on a variety of topics.
To put another way, I was hesitant to be as self-assuredly certain about how to define consciousness, intelligence, and sentience—and what it takes for them to emerge—as the experts who denounced Lemoine. The recent GPT breakthroughs have made me more so.
You should check out the WaPo article that originally published his concerns. He frequently makes many errors audibly with a reporter who is trying rather hard to see his point of view. I’m not trying to be rude, but he came off like kind of a sucker that would fall for a lot of scammer tactics. There were usually some form of strangeness such as him deciding when the content limit of the conversation began and ended. Further, he asks only leading questions, which would be fine if transformers didn’t specifically train to output the maximum likelihood text tokens from the distribution of their training set, which was internet text created by humans.
He was frequently cited as an engineer but I don’t think he actually had a strong background in engineering but rather in philosophy.
Chess is featured in Peter Norvig's "Artificial Intelligence: A Modern Approach" dating back to the 1st edition (1995) and at least up until the 3rd edition (2009). Algorithms such as alpha-beta pruning were definitely considered AI at the time.
The MIT AI Group, including Marvin Minsky, were the mainstream of AI more than 50 years ago, and begat the MIT AI Lab. They and everyone else at the time called their work AI.
What’s really special about this new era of ML is its accessibility.
Every business had problems with probabilistic solutions. ML is the way to do that. But the barrier to entry has been so high for so long. And so you had to be in big tech or a highly specialized shop to play.
Now all you need is an API key and one line of code to call the most powerful models in the world.
I agree, but it's worth mentioning explicitly that the main driver of accessibility has been generalizability. The fact that LLMs are so effective at zero-shot and few-shot learning tasks is what made out-of-the-box API access practical for a wide range of use cases. Products like SageMaker tried to automate away some of the complexity of training bespoke models, but they're still way more complicated than using an off-the-shelf model.
While I largely agree with this blog, I would like to point out one thing that often come up in HN: an inherent weakness or flaw in the current AI architecture that prevent it from widespread adoption. Right now it is either scaling hardware or hallucination. But there can be something deeper that we yet to see in public, for example, the inability to adapt to some specific, but critical, reasoning logic.
I don't see the post address this possibility even though it is very likely. Microsoft promised AI powered Office a while ago and we are still waiting. GPT4 is supposed to be able to look at images and solve problems but we still haven't seen that yet. Something is preventing these big companies from implementing these features and this is supposed to be a solved problem. How can we be sure that there is no serious roadblocks in the future that plunge the field into another AI winter?
In term of AI research and development, that was a long time ago. Microsoft is also pivoting hard into AI. They treat it as a cornerstone technology and I am sure they are not skimping on the cost of implementing it into their main products. Yet the only thing they have is Bing. Mind you they have early access to this tech. Bing uses GPT4 before the paper for GPT4 even came out so it is not like they only had it for 4 months.
The highlight of the fact that large org adoption is slow feels really valuable here. Especially when we just recently had all of the articles that claimed that ChatGPT was "over" when it appears that this is mostly because of things like summer vacations with students. The adoption and understanding of these products and the risk involved with hallucinations is something that will take time to understand.
I recently joined one of the LLM provider companies and watching these phases over the next few years will be really interesting. Especially combined with what's going on with regulation and the like.
Random aside, hi Elad! I think you're reading some of these comments. I just left Color after ~2.5 years. I hope to get to formally introduce myself to you one day.
ML in general has been in production in many places/companies for a while now. Specifically GPt-4 is useful as a coding assistant and a reference tool. However, it's hard to know whether what it's telling you is accurate or fake and if you want to be thorough you need to double check it. So, it's already useful but in a limited way and it remains to be seen how much better it's likely to get in the near future.
But let's not confuse that in general with AI or Machine Learning which is already used heavily in lots of places.
the specific type of architecture that gpt-4 uses for next word prediction is not the only possible architecture and is not what's used for many real world tasks. There a lot of different problems being addressed by ML and next word prediction is just one of them, although quite important.
I think the difference this time is the types of capabilities provided by transformers vs prior waves of AI are sufficiently different to allow many more types of startups to emerge, as well as big changes in some types of enterprise software by incumbents - in ways that were not enabled by pre-existing ML approaches.
The jury is still out on how useful these additional capabilities provided by transformers are. The question is what is the degree to which it’s possible to reduce the frequency and severity of hallucinations. If that degree is limited without major changes in architecture or major new breakthroughs, then usefulness of gpt-4 style models will be limited. And we just don’t know the answer yet. So far, usefulness of gpt-4 is real but extremely limited. Another issue is this approach means models are costly to train and don’t easily incorporate latest info about the world. In short, it’s way too early to hype this up.
I think this view, that present day AI is something absolutely new, is actually the dominant view.
I don't believe the "absolutely new" view is very enlightening very often. Notably, it seems like "the dawn of a new era of tech" offers little insight to the process of change (but much hype). Even something like the explosion of the Internet is usefully compared to earlier technologies and what gave it's uniqueness wasn't incomparability but an explosion of scale.
> it is worth thinking of this as an entirely new era
In this case, we could considered the «Early Days of AI» as not having happened yet. It is absurd to forget a past that worked to celebrate a present that largely does not, to sensible understanding of the goal. Tools must be reliable.
> and discontinuity from the past
Let us hope this is a bump on the road, a phase, better if organic and eventually productive of something good.
Definitely not my intention to forgot or denigrate the past. Obviously all this exists due to deep learning and prior architectures. What I have been running into is many people and companies are interpreting this as "just more of the same" for prior ML waves, when really this is an entirely new capability set.
To the (bad) analogy on cars versus planes - both have wheels and can drive on the ground, but planes open up an entirely new dimension / capability set that can transform transportation, logistics, defense and other areas that cars were important, but different enough in.
That's the big difference in this round. Before you had to have the ML expertise and the expertise to understand the implication of say a MNIST classifier example. Now anyone can "get" it because you're prompting and getting inference back in English. Underneath the fundamentals aren't all that different though, it has the same novelty factor and the same limitations apply.
I think the fundamentals are radically different, just due to the ease of applying this stuff.
I used to be able to train and deploy a ML model to help solve a problem... if I put aside a full week to get that done.
Now I tinker with LLMs five minutes at a time, or maybe for a full hour if I have something harder - and get useful results. I use them on a daily basis.
I hope there will be something like GPT for real world interaction / fine motor skills / robotics. Activities like cleaning or walking require the sort of intelligence that deep nets are able to mimic. The problem is how to train the model, what should be the task during training, what's the alternative to predicting the next word.
I'm not convinced that we should view Transformers and diffusion models as an entirely new era of AI. While they certainly represent advancements, it's crucial to acknowledge the incremental nature of progress in this field.
Wow, what a ridiculously disingenuous cherry-picked claim. If you actually read the paper you'll find this gem: "However, for one of the axes, including inaccurate or irrelevant information, Med-PaLM 2 answers were not as favorable as physician answers." Typical AI hype blog post donning the HN front page. At this point, I'm ready to put on my tinfoil hat and say that a16z, etc. is heavily pushing all these narratives because the next round of investments for the great majority of AI startups will almost certainly be the bagholder round.