I have come to two conclusions about the GPT technologies after some weeks to chew on this:
1. We are so amazed by its ability to babble in a confident manner that we are asking it to do things that it should not be asked to do. GPT is basically the language portion of your brain. The language portion of your brain does not do logic. It does not do analyses. But if you built something very like it and asked it to try, it might give it a good go.
In its current state, you really shouldn't rely on it for anything. But people will, and as the complement of the Wile E. Coyote effect, I think we're going to see a lot of people not realize they've run off the cliff, crashed into several rocks on the way down, and have burst into flames, until after they do it several dozen times. Only then will they look back to realize what a cockup they've made depending on these GPT-line AIs.
To put it in code assistant terms, I expect people to be increasingly amazed at how well they seem to be coding, until you put the results together at scale and realize that while it kinda, sorta works, it is a new type of never-before-seen crap code that nobody can or will be able to debug short of throwing it away and starting over.
This is not because GPT is broken. It is because what it is is not correctly related to what we are asking it to do.
2. My second conclusion is that this hype train is going to crash and sour people quite badly on "AI", because of the pervasive belief I have seen even here on HN that this GPT line of AIs is AI. Many people believe that this is the beginning and the end of AI, that anything true of interacting with GPT is true of AIs in general, etc.
So people are going to be even more blindsided when someone develops an AI that uses GPT as its language comprehension component, but does this higher level stuff that we actually want sitting on top of it. Because in my opinion, it's pretty clear that GPT is producing an amazing level of comprehension of what a series of words means. The problem is, that's all it is really doing. This accomplishment should not be understated. It just happen to be the fact that we're basically abusing it in its current form.
What it's going to do as a part of an AI, rather than the whole thing, is going to be amazing. This is certainly one of the hard problems of building a "real AI" that is, at least to a first approximation, solved. Holy crap, what times we live in.
But we do not have this AI yet, even though we think we do.
I love the mental model of GPT as only one part of the brain, but I believe that the integration of other "parts" of the brain will come sooner than you think. See, for instance, https://twitter.com/mathemagic1an/status/1624870248221663232 / https://arxiv.org/abs/2302.04761 where the language model is used to create training data that allows it to emit tokens that function as lookup oracles by interacting with external APIs. And an LLM can itself understand when a document is internally inconsistent, relative to other documents, so it can integrate the results of these oracles if properly trained to do so. We're only at the surface of what's possible here!
I also look to the example of self-driving cars - just because Tesla over-promised, that didn't discourage its competitors from moving forward slowly but surely. It's hard to pick a winner right now, though - so much culturally in big tech is up in the air with the simultaneity of layoffs and this sea change in AI viability, it's hard to know who will be first to release something that truly feels rock-solid.
There's one challenging thought experiment for the future of "AI", in anything remotely like how we are currently approaching it.
Put yourself in the shoes of primitive man. Kind of a weird saying given we wouldn't have had shoes, but bear with me here! Not long ago we lacked any language whatsoever, and the state of our art in technology was bashing stones together to use the resultant pointy pieces as weapons. Somehow we moved from that to Shakespeare, putting a man on the moon, and discovering the secrets of the atom - and its great and awful applications. And we did it all extremely quickly.
Now imagine you, primitive man, somehow trained an LLM on all quantifiable knowledge of the times. It should be somewhat self evident that it's not really going to lead you to the atom, Shakespeare, or anywhere beyond bashing stones together. Current LLM models are basically just playing 'guess the next word.' When that next word has not yet been spoken by mankind (figuratively speaking, but perhaps also literally to some degree), the LLM will never guess it.
Natural language search is a really awesome tool that will be able to help in many different fields. But I feel like in many ways we're alchemists trying to turn lead into gold. And we've just discovered how to create gold colored paint. It would feel like a monumental leap, but in reality you'd be no closer to your goal than you were the year prior. That said, paint also has lots of really great uses - but it's not what you're trying to do.
Yes, this is something that I've been thinking ever since GPT3 came out.
It's insanely impressive what it can do given it's just a language model. But if you start gluing on more components, we could end up with a more or less sentient AGI within a few years.
Bing have already hooked it up to a search engine. That post hooks it up to other tools.
I think what is needed next is a long term memory where it can store dynamic facts and smartly retrieve them later, rather than relying on the just the 4000 token current window.
It needs to be able to tell when a user is circling back to a topic they talked about months ago and pull out the relevant summaries of that conversation.
I also think it needs a working memory that it continually edits the token window to fit the relevant state of the conversation. Summarising recent tokens, saving things out long term storage, pulling new infomation in from long term storage, web searches and other tools.
I think a number of breakthroughs may be need to keep an AI 'sane' with a large working memory at this point. How do we keep them 'on track' at least in a way that seems somewhat human. Humans that have halting problem issues can either be geniuses (diving into problems and solving them to the point of ignoring their own needs), or clinical (ignoring their needs to look at a spot on the wall).
It is like we have unlocked an entirely new category of stereotyping that we never even realized existed.
Intelligence is not a prerequisite to speak fancifully.
Some other examples:
1. We generally assume that lawyers or CEOs or leaders who give well spoken and inspirational speeches actually know anything about what they're talking about.
3. Acting. Actors can easily portray smart characters by reading the right couple sentences off a script. We have no problem with this as an audience member. But CGI is needed for making your superhero character jump off a building without becoming a pancake.
>Intelligence is not a prerequisite to speak fancifully.
I think this may be a bastardization of the word intelligence. To speak fancifully an a manner accepted by the audience requires some kind of ordered information processing and understanding of the audiences taste. Typically we'd consider that intelligent, but likely Machiavellian depending on the intent.
The problem with the word intelligence is it is too big of concept. If you look at any part of our brain, you will not find (human)intelligence itself, instead it emerges from any number of processes occurring at different scales. Until we are able to break down intelligence into these smaller better (but not perfectly) classified pieces we are going to keep running into these same problems over and over again.
I don't think it is possible for people to emulate the behavior of superintelligent beings. In every story about them, they appear to not actually be any smarter than us.
There is one exception - Brainwave by Poul Anderson. He had the only credible (to me) take on what super intelligent people might be like.
Rupert Sheldrake suggests that consciousness is partly about seeing possibilities for our future, evaluating them, and choosing between them. If we make decisions the same way, they change to unconscious habits.
A hungry creature can eat what it sees or stay hungry. Another has more memory and more awareness of different bark and leaves and dead animals to choose from. Another has a better memory of places with safe food in the past and how to get to them. A tool using human can reason down longer chains like 'threaten an enemy and take their food' or 'setup a trap to kill an animal' or 'dig up root, grind root into mash, boil it, eat the gruel'. In that model, a super intelligence might be able to:
- Extract larger patterns from less information. (Con: more risk of a mistake).
- Connect more patterns or more distant patterns together with less obvious connections. (Con: risk of self-delusion).
- Evaluate longer chains of events more accurately with a larger working memory, more accurate mental models. (Con: needs more brain power, more energy, maybe longer time spent in imagination instead of defending self).
- Recall more precise memories more easily. (Con: cost of extra brain to store informaiton and validate memories).
This would be a good model for [fictional] Dr House, he's memorised more illnesses, he's more attentive to observing small details on patients, and more able to use those to connect to existing patterns, and cut through the search space of 'all possible illnesses' to a probable diagnosis based on less information than the other doctors. They run out of ideas quicker, they know fewer diseases, and they can't evaluate as long chains of reasoning from start to conclusion, or make less accurate conclusions. In one episode, House meets a genius physicist/engineer and wants to get his opinion on medical cases, but the physicist declines because he doesn't have the medical training to make any sense of the cases.
It also suggests that extra intelligence might get eaten up by other people - predicting what they will do, while they use their extra intelligence to try to be unpredictable. And it would end up as exciting as a chess final, where both grandmasters sit in silence trying to out-reason their opponent through deeper chains in a larger subset of all possible moves until eventually making a very small move. And from the outside players all seem the same but they can reliably beat me and they cannot reliably beat each other.
Actors have no problem playing smart people, but movie writers often have a LOT of trouble actually writing them. I'm still not sure that something like ChatGPT will be able to actually be clever.
I would also add that the mask is kind of coming off on CEOs and other inspirational speakers. Inspirational speaking is all a grift. They know only how they got rich (if they are even rich - most people on the speaking circuit make less than you think), not how anyone else did, and that knowledge usually doesn't translate well from the past to the future. There are a few exceptions, but most of these well-spoken people don't really know what they're talking about - they're just not self-aware enough to know that they don't know.
Yeah, I read this sentiment all the time and here's what I always say – just don't use it. Leave it to the rest of us if it's so wrong / off / bad.
BTW, have you considered maybe you aren't so good at using it? A friend has had very little luck with it, even said he's been 'arguing with it', which made me laugh. I've noticed that it's not obvious to most people that it's mostly about knowing the domain well enough to ask the right question(s). It's not magic, it won't think for you.
Here's the thing… my experience is the opposite… but maybe I'm asking it the right questions. Maybe it's more about using it to reason through your problem in a dialog, and not just ask it something you can google/duckduckgo. It seems like a LOT of people think it's a replacement for Google/search engines – it's not, it's another tool to be used correctly.
Here are some examples of successful uses for me:
I carefully explained a complex work issue that involves multiple overlapping systems and our need to get off of one of them in the middle of this mess. My team has struggle for 8 months to come up with a plan. While in a meeting the other day I got into a conversation with ChatGPT about it, carefully explained all the details and then asked it to create a plan for us to get off the system while keeping everything up / running. It spit out a 2 page, 8 point plan that is nearly 100% correct. I showed it to my team, and we made a few minor changes, and then it was anointed 'the plan' and we're actually moving forward.
THEN last night I got stuck on a funny syntax issue that googling could never find the answer. I got into a conversation with ChatGPT about it, and after it first gave me the wrong answer, I told it that I need this solution for the latest dontet library that follows the 'core' language syntax. It apologized! And then gave me the correct answer…
My hunch is the people that are truly irked by this are too deep / close to the subject and because it doesn't match up with what they've worked on, studied, invested time, mental energy into, well then of course it's hot garbage and 'bad'.
You all say it's solving these amazing complex tasks for you, but then don't provide any details.
Then "naysayers" like the linked article provide a whole document with images and appendixes showing it struggles with basic tasks...
So show us. For the love of god all of us would very much LIKE this technology to be good at things! Whatever techniques you're using to get these fantastical results, why don't you share them?
I can get it to provide snippets of code, CLI, toy functions that work. Beyond that, I am apparently an idiot compared to you AI-whisperers.
Also... Whatever happened to "extraordinary claims require extraordinary proof?"
An AI that creates a complex system, condensed into an actionable plan, that has stumped an entire team for 8 months is a (pardon the language) bat-shit insane claim. Things like this used to require proof to be taken seriously.
An analytic prompt contains the facts necessary for the response. This means the LLM acts as a translator.
A synthetic prompt does not contain the facts necessary for the response. This means the LLM acts as a synthesizer.
A complete baseball box score being converted into an entertaining paragraph description of the game is an analytic prompt and it will reliably produce a factual outcome.
Your technique of only posing analytical questions is indeed improving the results. It's not great, but I can actually get it to somewhat reliably summarize academic articles if I give it a citation now, which is pretty neat.
It doesn't summarize them well (I gave it a couple softballs, like summarizing McIntosh's "White Privilege: Unpacking the Invisible Knapsack", which almost every undergrad student in the humanities will have written about), but the stuff that it does make up is completely innocuous and not a big deal.
It’s amazing how taking time to slow down and approach things in a measured manner can lead to positive results.
It’s not at all surprising that most of the popular conversation about these tools is akin to randomly bashing into walls while attempting to push the peg into whatever “moment we need to talk about”.
What is again surprising is that HN is primarily overrun with randomly bashing into walls.
I guess I’m normally in threads about C memory arenas, a topic that probably draws more detailed thinkers in the first place.
My take: Because GPT is just stochasticly stringing words after each other, it is remarkedly good at producing text on par with other text available on the internet. So it can produce plans, strategies, itineraries and so on. The more abstract the better. The 8 point plan is likely great.
It will much more likely fail on anything which involves precision/computation/logic. That's why it can come up with an generic strategy but fail to repeat unadjusted GAAP earnings.
I agree it's pretty good at generalities, doesn't shit the bed quite so much. Yet to suggest a plan that an entire team of professionals, who have been working for 8 months could not figure out?
It's certainly not that good, absent some amazing wizardry or some very silly professionals in a very squishy field. Yet I have no explanation for why someone would go on the internet and lie about something like that.
There were comments a while back (less so now) of people making other claims like it was solving complex functions for them and writing sophisticated software.
The entire thing baffles me. If I could get it to do that, I'd be showing you all of my marvelous works and bragging quite a bit as your newfound AI-whisperer. Hell, I'd get it to write a script for me to run that evangelized itself (edit: and me of course, as its chosen envoy to mankind) to the furthest corners of the internet!
I mean no disparagement to the OP, but maybe the team is just really bad at planning. Or perhaps they’re stretched thin and are having a hard time seeing the bigger picture.
I’m not saying such a claim doesn’t need more info, but I’ve been on teams before that lacked anyone with a good project management skillset.
There was an article not too long ago, that I'm struggling to find, that did a great job of explaining why language models are much much better suited to reverse-engineering code than they are at forward-engineering it.
I have found ChatGPT to be a valuable tool for improving the clarity and readability of my writing, particularly in my blogs and emails.
You can try this by asking questions such as "Can you improve the grammar of the following paragraphs?". You can also specify the desired tone.
It is impressive at simplifying complex technical language. Take the following sentences from a draft I wrote:
To mitigate these issues, it is recommended to simulate the effect of say n random permutations using n random hash functions (h1, h2, … hn) that map the row numbers (say 1 to k) to bucket numbers of the same range (1 to k) without a lot of collisions. This is possible if k is sufficiently large.
What ChatGPT suggested:
To address these issues, it's recommended to simulate the effect of n random permutations using n random hash functions (h1, h2, … hn). These hash functions should map the row numbers (from 1 to k) to bucket numbers within the same range (1 to k) with minimal collisions. This is achievable if the range k is large enough.
It replaces Grammarly (I also don't want that keylogger spyware running anywhere near my systems) entirely and provides additional features. Can Grammarly also write the Haikus necessary to make me chuckle?
Yes, I've been using Grammarly for several years now. I still use it in conjunction with ChatGPT. It's efficient in correcting spelling and basic grammar errors. However, more advanced features are only available to premium users. At present, their $12/m fee is a bit steep for me.
The more advanced features of chatgpt are $20/m as I’m sure you’re aware.
What do you get out of chatgpt in this realm? I feel very annoyed by its constant tropes and predictable style. Is that something you dont need to care about?
Interesting. I've wondered how useful that the AI stuff added to Microsoft Office would be. Does that mean that there is be a "make my grammar" button like in the example above?
This reminds me of chain letters of old. "This guy ignored the letter, then his house burned down. But he found the letter back, sent it to 50 people, and lo and behold he won the lottery the very next day and was able to build a better house."
When prompted with a post dripping in snark, who aside from a masochist with nothing better to do is going to produce examples so they can be nitpicked to death? Posts like yours do not come off like wanting a discussion, they come off like angling for a fight.
Meanwhile, my sidebar of chat history is about five pages long and ever-growing. Quite a lot of my scripting in the past few weeks has been done with ChatGPT's help. So on one hand I have the angry skeptics who practically scream that is not doing the things I can see it doing, who appear to be incapable of discussing the topic without resorting to reductive disparagement, and on the other hand I can see the tasks it's accomplishing for me.
> My hunch is the people that are truly irked by this are too deep / close to the subject and because it doesn't match up with what they've worked on, studied, invested time, mental energy into, well then of course it's hot garbage and 'bad'.
That's quite the straw man you've built. Recognizing the limitations of a technology is not the same as calling it hot garbage.
As a language model it's amazing, but I agree with the GP. It's not intelligent. It's very good at responding to a series of tokens with its own series of tokens. That requires a degree of understanding of short scale context that we haven't had before in language models. It's an amazing breakthrough.
But it's also like attributing the muscle memory of your hand to intelligence. It can solve lots of problems. It can come up with good configurations. It is not, on its own, intelligent.
> It seems like a LOT of people think it's a replacement for Google/search engines
Well, that "lot" includes the highest levels of management from Microsoft and Google, so maybe the CAPS are justified. And the errors we're talking about here are errors produced by said management during demos of their own respective product. You would think they know how to use it "correctly".
But the question is, are they wrong in that they don't know how to use / promote an otherwise good product, or are they wrong because they are choosing to put forward something that is completely ill-suited for the task?
"Just don't use it" is not salient advice for non-technical people who don't know how it works, and are misled by basically dishonest advertising and product packaging. But hopefully the market will speak, users at large will become educated about its limits via publicized blunders, and these products will be correctly delimited as "lies a lot but could be useful if you are able/willing to verify what it says."
I think the original sentence was written more in of "Your loss is my gain" competitive advantage vein. The real trick is, as you say, to critically assess the output, and many people are incapable of that.
I feel similarly reading many critiques, but honestly the GP is one of the more measured ones that I've read - not sure that your comment is actually all that responsive or proportionate.
> Maybe it's more about using it to reason through your problem in a dialog, and not just ask it something you can google/duckduckgo.
Your experience with it sounds very similar to my own. It exhibits something like on-demand precision; it's not a system with some fundamental limit to clarity (like Ted Chiang via his jpeg analogy, and others, have argued): it may say something fuzzy and approximate (or straight up wrong) to begin with but—assuming you haven't run into some corner where its knowledge just bottoms out—you can generally just tell it that it made a mistake or ask for it to elaborate/clarify etc., and it'll "zoom in" further and resolve fuzziness/incorrect approximation.
There is a certain very powerful type of intelligence within it as well, but you've got to know what it's good at to use it well: from what I can tell it basically comes down to it being very good at identifying "structural similarity" between concepts (essentially the part of cognition which is rooted in analogy-making), allowing it to very effectively make connections between disparate subject matter. This is how it's able to effectively produce original work (though typically it will be directed there by a human): one of my favorite examples of this was someone asking it to write a Lisp program that implements "virtue ethics" (https://twitter.com/zetalyrae/status/1599167510099599360).
I've done a few experiments myself using it to formalize bizarre concepts from other domains and its ability to "reason" in both domains to make decisions about how to formalize, and then generating formalizations, is very impressive. It's not enough for me to say it is unqualifiedly "intelligent", but it imo its ability to do this kind of thing makes it clear why calling it a search engine, or something merely producing interpolated averages (a la Chiang), is so misleading.
Just to flip this around for a second, with both of your examples, it sounds like you may have a problem with writer's block or analysis paralysis, and ChatGPT helped you overcome that simply due to the fact that it isn't afraid of what it doesn't know. If that helps you, go for it.
On the other hand, it could also help you to just write a random plan or try a few random things when you get stuck, instead of trying to gaze deeply into the problem for it to reveal its secrets.
> Yeah, I read this sentiment all the time and here's what I always say – just don't use it. Leave it to the rest of us if it's so wrong / off / bad.
If it were only a matter of private, individual usage, I'd be fine with it. If that's all you're asking for, we can call it a deal. But it isn't, is it?
> THEN last night I got stuck on a funny syntax issue that googling could never find the answer. I got into a conversation with ChatGPT about it, and after it first gave me the wrong answer, I told it that I need this solution for the latest dontet library that follows the 'core' language syntax. It apologized! And then gave me the correct answer…
Great that it worked for you, I had a similar problem (Google had no answer) but more complex than syntax issue, I'm also domain expert in what I was asking and chatGPT also gave me wrong answer the first time then apologized and gave me wrong answer again, I've explained what's wrong and it did it again and again.. Never providing correct answer so I just gave up and used human brain. Seems like your problem was in distribution.
In other news I asked it to make a list of all the dates in 2023 that were neither weekends nor US federal holidays and it left Christmas Day on the list.
Yea, I think people hide “the magic smoke” by using complex queries and then filling in the gaps of chatGPT’s outputs with their own knowledge, which then makes them overvalue the output. Strip that away to simple examples like this and it becomes more clear what’s going on. (I think there IS a lot of value for them in their current state because they can jog your brain like this, just not to expect it to know how to do everything for you. Think of it as the most sophisticated rubber duck that we’ve made yet).
I don't understand this take. These LLM-based AIs provide demonstrably incorrect answers to questions, they're being mass-marketed to the entire population, and the correct response to this state of affairs is "Don't use it if you don't know how"? As if that's going to stop millions of people from using it to unknowingly generate and propagate misinformation.
Isn't that what people said about Google Search 20 years ago- that people won't know how to use it, that they will find junk information, etc. And they weren't entirely wrong, but it doesn't mean that web search isn't useful.
No, I don't recall anyone saying that. They mostly said "this is amazingly effective at finding relevant information compared to all other search engines." Google didn't invent the Web, so accusing it of being responsible for non-factual Web content would have been a strange thing to do. Bing/Chat-GPT, on the other hand, is manufacturing novel non-factual content.
That’s a good point. I don’t think anyone is denying that GPT will be useful though. I’m more worried that because of commercial reasons and public laziness / ignorance, it’s going to get shoehorned into use cases it’s not meant for and create a lot of misinformation. So a similar problem to search, but amplified
There are some real concerns for a technology like ChatGPT or Bing's version or whatever AI. However, a lot of the criticisms are about the inaccuracy of the model's results. Saying "ChatGPT got this simple math wrong" isn't as useful or meaningful of a criticism when the product isn't being marketed as a calculator or some oracle of truth. It's being marketed as an LLM that you can chat with.
If the majority of criticism was about how it could be abused to spread misinformation or enable manipulation of people at scale, or similar, the pushback on criticism would be less.
It's nonsensical to say that ChatGPT doesn't have value because it gets things wrong. What makes much more sense is to say is that it could be leveraged to harm people, or manipulate them in ways they cannot prevent. Personally, it's more concerning that MS can embed high-value ad spots in responses through this integration, while farming very high-value data from the users, wrt advertising and digital surveillance.
> It's being marketed as an LLM that you can chat with.
... clearly not, right? It isn't just being marketed to those of us who understand what an "LLM" is. It is being marketed to a mainstream audience as "an artificial intelligence that can answer your questions". And often it can! But it also "hallucinates" totally made up BS, and people who are asking it arbitrary questions largely aren't going to have the discernment to tell when that is happening.
Great write up. My experience is spot on with your examples.
> I've noticed that it's not obvious to most people that it's mostly about knowing the domain well enough to ask the right question(s). It's not magic, it won't think for you.
Absolutely right with the part of knowing the domain.
I do not entertain or care about the AI fantasies because ChatGPT is extremely good at getting me other information. It saves me from opening a new tab, formulating my query and then hunting for the information. I can save that extra time for what latest / relevant information I should grab from Google.
Google is still in my back pocket for the last mile verification and judgement. I am also skeptical of the information ChatGPT throws out (such as old links). Other than that, ChatGPT to me is as radical as putting the url and search bar into one input. I just move faster with the information.
When did they say it’s garbage? They gave their opinions on its shortcomings and praised some of the things it excels at. You’re calling the critics too emotional but this reply is incredibly defensive.
Your anecdotes are really cool and a great example of what GPT can do really well. But as a technical person, you’re much more aware of its limitations and what is and isn’t a good prompt for it. But as it is more and more marketed to the public, and with people already clamoring to replace traditional search engines with it, relying on the user to filter out disinformation well and not use it for prompts it struggles with isn’t good enough.
I too have a very positive experience. I ask specific questions about algorithms and how technical projects work and I enjoy its answers. They won’t replace my need to visit a real search engine neither I take them at face value. But as a starting point for any research I think it’s an amazing tool. It’s also quite good for marketing stuff, like writing e-mails, cover letters, copy for your website, summarizing or classifying text, and all language related stuff.
People think it’s Cortana from Halo and ask existential questions or they’re trying to get it to express feelings.
I think the AI part on its presentation created too much expectations of what it can do.
This doesn't seem like a response to your parent comment, which in no way suggested they were "irked" by this or consider it bad. It was an insightful comment contrasting strengths and weaknesses of these language models. It's a pretty weak rebuttal in my view to just say "there are no weaknesses, you're just doing it wrong!".
If only a small subset of people online are able to truly take advantage of ChatGPT, then I don't think Google is as threatened by it as many have portrayed.
Sentient AIs in science fiction are always portrayed as being more-or-less infallible, at least when referencing their own knowledge banks.
Then ChatGPT comes along and starts producing responses good enough that people feel like almost sentient AI. And they suddenly start expecting it to share the infallibility that fictional AIs have always possessed.
But it's not a sentient AI. It's just a language model. Just a beefed up auto-correct. I'm very impressed just what capabilities a language model gets when you throw this many resources at it (like, it seems to be able to approximate logic and arithmetic to decent accuracy, which is unexpected).
Also... even if it was a sentient AI, why would it be infallible? Humans are sentient, and nobody ever accused us of being infallible.
>But it's not a sentient AI. It's just a language model. Just a beefed up auto-correct.
There is a large space between "sentient" and "beefed up autocorrect". Why do people insist on going for the most reductive description they can muster?
Don't mistake my "reductive description" as disapproval. I'm actually really impressed and I see a bright future.
But I think it's really important that we don't give GPT3 more credit that it deserves.
All the discourse around GPT3 and it's derivatives like Github copilot or ChatGPT have shown that people (even tech literate people) have a strong bias towards anthropomorphising it as some kind of "proto-sentient" AI.
In my opinion, this bias is actually very damaging towards the reputation of language models. People start expecting way too much from them, and then feeling confused or even betrayed when the language models start confidently sprouting bullshit.
Also, I don't think "beefed up autopredict" is reductive at all. (Though I might have said "beefed up autocorrect" in my previous comment, whoops). The 10,000 ft view of it's runtime architecture is identical to autopredict. You take some input context, encode it into tokens and feed them into a neural network, and to get a prediction of the next few words.
The innovation in GPT3 has nothing to do with moving away from that basic architecture.
The innovation is all about improving the encoding of tokens, changing the internal architecture of the neural network support larger models and make better predictions, improving the training process and finally the training data itself. They also massively increased the size of both the training set and the model... but I'm not sure that counts as innovation.
IMO, GPT3 is actually way more impressive once you start thinking of it as nothing more than a beefed up autopredict.
It is beefed up auto predict in the same way that brains are beefed up electrochemical integrators. While technically true, it is maximally uninsightful. We should be aiming for maximum insight, not the opposite.
I get the impression that you are massively underselling auto-predict.
If you took GPT3's architecture and scaled it down to the size and training set of a typical auto-predict, it would produce near identical results. You wouldn't be able to tell the two apart.
Likewise, if we took an auto-predict architecture from 8 years ago, scaled it up to the size of GPT3 and could train it on GPT3's training set, it would produce similar output to GPT3 and we would see the exact same emergent intelligence capabilities. (Though, it's probably not possible to compete training in a practical time-frame, the real innovation of GPT3 was optimising the architecture to make training such a large model practical)
I think very insightful to point out just how similar the two are. Because it shows the capabilities of language models are not because of any architectural element, but are emergent from the model and it's training data.
It also makes me excited for what will happen when they move beyond the "beefed up auto-predict" architecture. (Arguably, Bing has taken a small step in this direction by bolting a search engine onto it)
>If you took GPT3's architecture and scaled it down to the size and training set of a typical auto-predict, it would produce near identical results.
This is almost certainly not true. The number of parameters is an important feature related to the quality of output. If you scaled the architecture down significantly, it would be significantly less capable[1]. But perhaps I misunderstand your point.
>Likewise, if we took an auto-predict architecture from 8 years ago, scaled it up to the size of GPT3 and could train it on GPT3's training set, it would produce similar output to GPT3
This is also not true. The transformer is a key piece in the emergent abilities of language models. The difficulties in scaling RNNs are well known. Self-supervised learning is powerful, but it needs to be paired with a flexible architecture to see the kinds of gains we see with LLMs.
Stacked Transformers with self-attention are extremely flexible in finding novel circuits in service to modelling the training data. The question is how to characterize this model in a way that doesn't short-sell what it is doing. Reductively describing it in terms of its training regime is just to treat the resulting model as explanatorily irrelevant. But the complexity and the capabilities are in the information dynamics encoded in the model parameters. The goal is to understand that.
> If you scaled the architecture down significantly, it would be significantly less capable[1]. But perhaps I misunderstand your point.
No, my point is that if you scaled a transformer based architecture, down to the equivalent parameter size and training set of a typical 2015 era auto-predict, it would produce near identical results to a 2015 era auto-predict.
> The difficulties in scaling RNNs are well known
The scaling issues in training RNNs are completely irreverent to my point.
Transformers are computationally equivalent to RNNs. It's possible to convert a pre-trained Transformer model into an RNN [1]. There is nothing magical about the Transformer architecture that makes it better at generation.
>it would produce near identical results to a 2015 era auto-predict.
I don't know that this is true, but it is plausible enough. But the benefit of Transformers is that they are stupid easy to scale. It is in scale that they are able to perform so remarkably across so many domains. Comparing the function of underparameterized versions of the models and concluding that some class of models are functionally equivalent due to their equivalent performance in underparameterized regimes is a mistake. The value of an architecture is in its practical ability to surface functional models. In theory, a MLP with enough parameters can model any function. But in reality, finding the model parameters that solve real world problems becomes increasingly difficult. The inductive biases of Transformers is crucial in allowing it to efficiently find substantial models that provide real solutions. The Transformer architecture is doing real substantial independent work in the successes of current models.
Because the average person you speak to would consider beefed up autocorrect to be near magic as it is. Once you get near to the limits of an individuals comprehension adding more incomprehensible statements/ideas doesn't really change much, their answer is still 'magic'.
The lack of consistency is a big issue. It may well be able to organize your trip to mexico, but then it tells me that "the product of two primes must be prime because each factor is prime" ... how will one ever trust it? Moreover, how to use it?
If a Tesla can get you there with 1% human intervention, but that happens to be the 1% that would have killed you had you not intervened ... how do we interface with such systems?
I find interacting with ChatGPT strangely boring. And Copilot is neat but I'm not blown away by it. However... just for laughs I threw some obfuscated genetic algorithm code I'd written at ChatGPT and asked it to guess what the code did. It identified the purpose of the code and speculated on the meaning of certain parameters that weren't clear in the sample I'd presented it. Pretty impressive.
I also showed it some brainfuck code for generating a Mandelbrot set, and it immediately identified it. From that point forward, though, it thought all other brainfuck code generated Mandelbrot sets.
"Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training. Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children. These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models' improving language skills."
We're just seeing the standard hype cycle. We're in the "Peak of Inflated Expectations" right now. And a lot of people are tumbling down into the "Trough of Disillusionment"
Behind all the hype and the froth there are people who are finding uses and benefits - they'll emerge during the "Slope of Enlightenment" phase and then we'll reach the "Plateau of Productivity".
Except what’s not conveyed here is social pushback due to any perceived harms. Look at social media and “big tech” - the world, despite high usage, is now casting these as social ills ready for dismantling. This tech cycle is more appropriate when its potential to ebb social fabric is much less. The forces here will be too strong.
I agree completely with the first part of your post. However, I think even performing these language games should definitely be considered AI. In fact, understanding natural language queries was considered for decades a much more difficult problem than mathematical reasoning. Issues aside, it's clear to me we are closer to solving it than we ever have been.
Sorry, I didn't mean that LLMs are not a subset of AI. They clearly are. What they are not is equal to AI; there are things that are AI that are not LLMs.
It is obvious when I say it, but my internal language model (heh) can tell a lot of people are not thinking that way when they speak, and the latter is often more reliable than how people claim they are thinking.
I think the problem here is in a classification of what is ( I ) in the first place. For us to answer the question of what equals AI we must first answer the question of what equals human intelligence in a self consistent, logical, parsable manner.
I think the ultimate problem with AI is its overvalued as a technology in general. Is this "amazing level of comprehension" really that necessary given the amount of time/money/effort devoted to it? What's become clear with this technology that's been inaccurately labeled as "AI" is that it doesn't produce economically relevant results. It's a net expense anyway you slice it. It's like seeing a magician perform an amazing trick. It's both amazing and entirely irrelevant at the same time. The "potential" of the technology is pure marketing at this point.
I mean, I take if I stuck you back in 1900 you'd say the same about flying. "Look at all this wasted effort for almost nothing". And then pretty quickly the world rapidly changed and in around 50 years we were sending things to space.
Intelligence isn't just one thing, really I would say its the emergent behavior of a bunch of different layers working together. The LLM being just one layer. As time goes on and we add more layers to it the usefulness of the product will increase. At least from a selfish perspective of a corporation, whoever can create one of these intelligences may have the ability to free themselves of massive amounts of payroll by using the AI to replace people.
The potential of AI should not be thought of any differently than the potential of people. You are not magic, just complicated.
I don't get the point of comparing apples to make a point about oranges. Flying isn't AI. Nor is "progress" a permanent state. If you want to stay in the flying comparison: in 2000 you can fly from NY to Paris in 3 hours on a Concord, something no longer possible in 2023. Why? Because economics made it unfeasible to maintain. Silicon Valley has made enough promises using "emergent" behavior and other heuristics to justify poor investments. Unfortunately it's taken out too many withdrawals from its bank of credibility and there'a not enough to cloud their exit schemes in hopes and dreams.
And yet every day we still fly faster and farther than the objects designed by evolution. And much like evolution of creatures go extinct.
Promises are always meaningless, progress is in the results, and recently LLMs have been giving us results. You can choose to invest or not invest, but in an environment where there is still a lot of investment money around I don't see work on this stopping any time soon.
It seems to me it is really good at writing. I would think it could replace the profession of techincal writing for the most part, it could help you write emails, (bring back clippy MS you cowards), it could be used as a frontend to an FAQ/self service help type system.
Have you read the article? You'd have to have 100% faith in the tech to allow it to side-step an actual person. Unless your site is purely a click-farm, you're still probably hiring someone to check it--so what's the point of having it?
While I agree with everything you've said, I also see that steady, incremental progress is being made, and that as we identify problems, we're able to fix it. I also see lots of money being thrown at this and enough people finding genuine niche uses for this that I see it continuing on. Wikipedia was trash at first, as were so many other technologies. But there was usually a way to slowly improve it over time, early adopters to keep the cash flowing, identifiable problems with conventional solutions, etc.
> new type of never-before-seen crap code that nobody can or will be able to debug short of throwing it away and starting over
Good thing is that we are dealing with exactly same type of code here and there for tens of years already. Actually, every time I see a commercial codebase not exactly like a yarn of spaghetti, I thank gods for it, because it is not a rule, but an exception.
What I really wonder is what it will be like when the next version of the same system will be coded from the ground up by next version of the same ML model?
> the language portion of your brain does not do logic
This seems ... Wrong? I suppose that most of what we generally call high-level logic is largely physically separate from some basic functions of language, but just a blanket statement describing logic and language as two nicely separate functions cannot be a good model of the mind.
I also feel like this goes to the core of the debate, is there any thought going on or is it just a language model; I'm pretty sure many proponents of AI believe that thought is a form of very advanced language model. Just saying the opposite doesn't help the discussion.
I’m not sure whether the hype train is going to crash, or whether only a few very smart companies, using language problems for what they’re really good at (aka: generate non-critical texts), will manage to revolutionize one field.
We’re at the very first beginning of the wave, so everybody is a bit overly enthusiastic, dollars are probably flowing, and ideas are popping everywhere. Then will come a harsh step of selection. The question is what will the remains look like, and how profitable they’ll be. Enough to build an industry, or just niche.
> GPT is basically the language portion of your brain. The language portion of your brain does not do logic. It does not do analyses.
I like this analogy as a simple explanation. To dig in though, do we have any reason to think we can’t teach a LLM better logic? It seems it should be trivial to generate formulaic structured examples that show various logical / arithmetic rules.
Am I thinking about it right to envision that a deep NN has free parameters to create sub-modules like a “logic region of the brain” if needed to make more accurate inference?
"To dig in though, do we have any reason to think we can’t teach a LLM better logic?"
Well, one reason is that's not how our brains work. I won't claim our brains are the one and only way things can work, there's diversity even within human brains, but it's at least a bit of evidence that it is not preferable. If it were it would be an easier design than what we actually have.
I also don't think AIs will be huge undifferentiated masses of numbers. I think they will have structure, again, just as brains do. And from that perspective, trying to get a language model to do logic would require a multiplicatively larger langauge model (minimum, I really want to say "exponentially" but I probably can't justify that... that said, O(n^2) for n = "amount of math understood" is probably not out of the range of possibility and even that'd be a real kick in the teeth), whereas adjoining a dedicated logic module to your language model will be quite feasible.
AIs can't escape from basic systems engineering. Nothing in our universe works as just one big thing that does all the stuff. You can always find parts, even in biology. If anything, our discipline is the farthest exception in that we can build things in a fairly mathematical space that can end up doing all the things in one thing, and we consider that a serious pathology in a code base because it's still a bad idea even in programming.
This all matches my intuition as a non-practitioner of ML. However, isn’t a DNN free to implement its own structure?
Or is the point you’re making that full connectivity (even with ~0 weights for most connections) is prohibitively expensive and a system that prunes connectivity as the brain does will perform better? (It’s something like 1k dendrites per neuron max right?)
The story of the recent AI explosion seems to be the surprising capability gains of naive “let back-prop figure out the structure” but I can certainly buy that neuromorphic structure or even just basic modular composition can eventually do better.
(One thought I had a while ago is a modular system would be much more amenable to hardware acceleration, and also to interpretability/safety inspection, being a potentially slower-changing system with a more stable “API” that other super-modules would consume.)
> do we have any reason to think we can’t teach a LLM better logic?
I'll go for a pragmatic approach: the problem is that there is no data to teach the models cause and effect.
If I say "I just cut the grass" a human would understand that there's a world where grass exists, it used to be long, and now it is shorter. LLMs don't have such a representation of the world. They could have it (and there's work on that) but the approach to modern NLP is "throw cheap data at it and see what sticks". And since nobody wants to hand-annotate massive amounts of data (not that there's an agreement on how you'd annotate it), here we are.
I call this the embodiment problem. The physical limitations of reality would quickly kill us if we didn't have a well formed understanding of them. Meanwhile AI is stuck in 'dream mode', much like when we're dreaming we can do practically anything without physical consequence.
To achieve full AI I believe will eventually have to our AI's have a 'real world' set of interfaces to bounds check information.
Who’s to say that a large language model is fundamentally incapable of learning some kind of ability to reason or apply logic?
Fundamentally, our brains are not so different, in the sense that we are not also apply some kind of automated theorem solver directly. We get logic as an emergent behavior of a low-level system of impulses and chemical channels. Look at kids, they may understand simple cause and effect, but gradually learn things like proof by contradiction (“I can’t have had the candy because I was in the basement”). No child is born able to apply logic in a way that is impressive to adults - and many adults are not able to apply it well either.
I don’t think LLMs are going to automatically become super-human logicians capable of both complex mathematical proofs and composing logically consistent Homerian Epics, but to me there is no reason they could not learn some kind of basic logic, if only because it helps them better model what their output should be.
Really liked your analogy on GPT being similar to the language center of the brain. Almost all current methods to teach GPT deductive logic has been through an inductive approach; giving it training examples on how to do deduction. Thing is it might be possible to reach 80% of the way there with more data and parameters but a wall will be hit sooner or later
The combination of natural language communication with search and planning is very exciting. Although it has been overshadowed by the popularity of ChatGPT, [1] demonstrates the capability of using intents generated through a strategic planning stage to drive convincing human-like dialogue (as tested in blitz Diplomacy).
I'm really interested in the creation of human-compatible agents. As you mention, these agents will likely be composed of multiple components which have specialised functionality.
[1] "Human-level play in the game of Diplomacy by combining language models with strategic reasoning", explores the integration of language models with strategic reasoning. https://www.science.org/doi/10.1126/science.ade9097
I have bookmarked your comment and I hope to have the discipline to come back to it every 3 months or so for the next couple of years. Because I think you are right but I didn't noticed it before. When the real things cole, we will probably be blindsided.
I hearken back before dot-bomb and occasionally people would ask me to work on "web sites" which they'd built with desktop publishing software (e.g. ColdFusion).
They'd hand me the code that somebody would've already hacked on. Oftentimes, it still had the original copyright statements in it. Can't get the toothpaste back in the tube now! Plus it's shitcode. Where is that copy of ColdFusion? Looks of complete dumbfoundment.
Oh gee kids, my mom's calling me for lunch; gotta go!
I'm not sure I understand your point about everyone thinking GPT is real AI. Even luddites don't think that.
I haven't met anyone who thinks that GPT is the end of AI (maybe the beginning).
People are excited about GPT because it offloads certain tasks well enough and accurately enough that it's virtually the first practical use of such tech in the world.
> "We are so amazed by its ability to babble in a confident manner"
But we do this with people - religious leaders, political leaders, 'thought' leaders, venture capitalists, story tellers, celebrities, and more - we're enchanted by smooth talkers, we have words and names for them - silver tongued, they have the gift of the gab, slick talker, conman, etc. When a marketing manager sells a CEO on cloud services, and neither of them know what cloud services are, you can argue that it should matter but it doesn't actually seem to matter. When a bloke on a soapbox has a crowd wrapped around their finger, everyone goes home after and the most common result is that the feeling fades and nothing changes. When two people go for lunch and one asks 'what's a chicken fajita?' and the other says 'a Spanish potato omelette' and they both have a bacon sandwich and neither of them check a dictionary, it doesn't matter.
Does it matter if Bing Chat reports Lululemon's earnings wrongly? Does it matter if Google results are full of SEO spam? It "should" matter but it doesn't seem to. Who is interested enough in finances to understand the difference between "the unadjusted gross margin" and "The gross margin adjusted for impairment charges" and the difference matters to them, and they are relying exclusively on Bing Chat to find that out, and they can't spot the mistake?
I suspect that your fears won't play out because most of us go through our lives with piles of wrong understanding which doesn't matter in the slightest - at most it affects a trivia quiz result at the pub. People with life threatening allergies take more care than 'what their coworker thinks is probably safe'. We're going to have ChatGPT churn out plausible sounding marketing material which people don't read. If people do read it and call, the call center will say "sorry that's not right, yes we had a problem with our computer systems" and that happens all the time already. Some people will be inconvenienced, some businesses will suffer some lost income, society is resilient and will overall route around damage, it won't be the collapse of civilisation.
> When a bloke on a soapbox has a crowd wrapped around their finger, everyone goes home after and the most common result is that the feeling fades and nothing changes.
I mean, until the crowd decides to follow the bloke and the bloke says "Lets kill all the ____" and then we strike of a new world war...
> So people are going to be even more blindsided when someone develops an AI that uses GPT as its language comprehension component
I don't think that would work, because GPT doesn't actually comprehend anything. Comprehension requires deriving meaning, and GPT doesn't engage with meaning at all. It predicts which word is most likely to come next in a sequence, but that's it.
What I think we'd be more likely to end up with is something GPT-esque which, instead of simply generating text, transforms English to and from a symbolic logic language. This logic language would be able to encode actual knowledge and ideas, and it would be used by a separate, problem-solving AI which is capable of true logic and analysis—a true general AI.
The real question, IMO, is if we're even capable of producing enough training data to take such a problem-solving AI to a serious level of intelligence. Scenarios that require genuine intelligence to solve likely require genuine intelligence to create, and we'd need a lot of them.
>Comprehension requires deriving meaning, and GPT doesn't engage with meaning at all. It predicts which word is most likely to come next in a sequence, but that's it.
Why think that "engaging with meaning" is not in the solution-space of predicting the next token? What concept of meaning are you using?
You could argue that GPT has a model of meaning somewhere inside of it, but that's besides the point. If that meaning is hiding latent inside GPT, then it's not accessible to any other system which might want to use GPT as an interface. GPT accepts English as input and produces English as output; that's it.
That said, no, I don't think GPT properly grasps meaning, and my reason for that is simple: It regularly contradicts itself. It can put together words in a meaningful-looking order, but if it actually understood what those words meant as an emergent property of its design, then you wouldn't be able to trick it into saying things that don't make sense or are contradictory. If someone actually understands a subject, they won't make obvious mistakes when they discuss it; since GPT makes obvious mistakes, it can't actually grasp meaning—only brush up against it.
>since GPT makes obvious mistakes, it can't actually grasp meaning—only brush up against it.
This argument doesn't apply to GPT because it isn't a single coherent entity with a single source of truth. GPT is more like a collection of personas, which persona you get is determined by how you query it. One persona may say things that contradict other personas. Even within personas you may get contradictory statements because global consistency is not a feature that improves training performance, and can even hinder it. Samples from its training data are expected to be inconsistent.
It is important not to uncritically project expectations onto LLMs derived from our experiences with human agents. Their architecture and training regime is vastly different than humans and so we should expect their abilities to manifest differently than analogous abilities in humans. We can be easily mislead if we don't modify our expectations for this alien context.
I get what you mean here but they probably mean referential meaning... having never seen a dog, GPT doesn't really know what a dog is on a physical level, just how that word relates to other words.
I think if you could somehow examine the output of your language model in isolation, you would find it also doesn't "comprehend". Comprehension is what we assign to our higher level cognitive models. It is difficult to introspectively isolate your own language center, though.
The language centres of our brain don't know what a dog is, but they can take the word "dog" and express it on a level that the logic centres of our brain can use. I don't know if "comprehending" is the right word, exactly, but it's transforming information from one medium to another in preparation for semantic and logical analysis.
GPT doesn't do that. What it does is related to meaning, but unlike the language comprehension parts of our brains, which are (presumably) stepping stones between language and reason, GPT doesn't connect to any reasoning thing. It can't. It's not built to interface with anything like that. It just reproduces patterns in language rather than extracting semantic meaning from them in a way that another system can use. I'm not saying that's more or less complicated—just different.
> it's pretty clear that GPT is producing an amazing level of comprehension of what a series of words means
It comprehends nothing at all. It's amazing at constructing sequences of words to which human readers ascribe meaning and perceive to be responsive to prompts.
Exactly. Is like a mouth speaking without brain. We need a secondary "reasoning" AI that can process the GPT further , adding time/space coordonates and as well as basic logic including counting , and then maybe we see something I can rely on.
> We need a secondary "reasoning" AI that can process the GPT further
We also need "accountability" and "consequences" for the AI, whatever that means (we'd first have to define what "desire" means for it).
In the example from the article, the Bing GPT completely misrepresented the financial results of a company. A human finance journalist wouldn't misrepresent those results due to fear for their loss of reputation, and their desire for fame, money, and acceptance. None of those needs exist for an LLM.
> it's pretty clear that GPT is producing an amazing level of comprehension of what a series of words means. The problem is, that's all it is really doing.
> I have come to two conclusions about the GPT technologies after some weeks to chew on this:
<sarcasm>Just 2 weeks of training data? Surely the conclusions are not final? No doubt a lot has changed over those 2 weeks?
I think the real joke is still, Q: "what is intelligence?" A: "We don't know, all we know is that you are not a good example of it".
I fear these hilarious distortions are only slightly different from those we mortals make all the time. They stand out because we would get things wrong in different ways.
> 1. We are so amazed by its ability to babble in a confident manner that we are asking it to do things that it should not be asked to do.
God, where have we seen this before? The further up the human hierarchy the more elaborate the insanity. Those with the most power, wealth and even those of us with the greatest intellect manage to talk an impressive amount of bullshit. We all do it up to our finest men.
The only edge we have over the bot is that we know when to keep our thoughts to ourselves when it doesn't help our goal.
To do an idiotic time line of barely related events which no doubt describes me better than it describes the topic:
I read how a guy who contributed much to making TV affordable enough for everyone. He thought it was going to revolutionize learning from home. Finally the audience for lectures given by our top professors could be shared with everyone around the globe!
We got the internet, the information supper highway, everyone was going to get access to the vast amount of knowledge gathered by mankind. It only took a few decades for google to put all the books on it. Or wait....
And now we got the large language models. Finally someone who can tell us everything we want to know with great confidence.
These 3 were and will be instrumental in peddling bullshit.
Q: Tell me about the war effort!
what I want to hear: "We are winning! Just a few more tanks!"
what I don't want to hear: "We are imploding the world economy! Run to the store and buy everything you can get your hands on. Cash is king! Arm yourself. Buy a nuclear bunker."
Can one tell people that? It doesn't seem in line with the bullshit we are comfortable with?
> GPT is basically the language portion of your brain. The language portion of your brain does not do logic. It does not do analyses. But if you built something very like it and asked it to try, it might give it a good go.
At least it doesn't have sinister motives (we will have to add those later)
> In its current state, you really shouldn't rely on it for anything. But people will, and as the complement of the Wile E. Coyote effect, I think we're going to see a lot of people not realize they've run off the cliff, crashed into several rocks on the way down, and have burst into flames, until after they do it several dozen times. Only then will they look back to realize what a cockup they've made depending on these GPT-line AIs.
It seems to me that we are going to have to take the high horse and claim the low road.
I think what I would add to your comment, and specifically criticize the HN hype around it, is that all these GPT "AI" tools are entirely dependent on the OpenAI API. ChatGPT might have shown a glimpse of spark by smashing two rocks together, but it is nowhere near being able to create a full-blown fire out of it.
Outside of Google and OpenAI, I doubt there is a single team in the world right now that would be capable of recreating ChatGPT from scratch using their own model.
I would love to know how much of ChatGPT is "special sauce" and how much of it is just resources thrown at the problem at a scale no one else currently wants to compete with.
I am not making any implicit claims here; I really have no idea.
I'm also not counting input selection as "special sauce"; while that is certainly labor-intensive, it's not what I mean. I mean more like, are the publicly-available papers on this architecture sufficient, or is there some more math not published being deployed?
I doubt there is a single team in the world right now that would be capable of recreating ChatGPT from scratch using their own model.
Why not? Lack of knowhow or lack or resources? If say Baidu decided to spend a billion dollars on this problem, don't you think they have the skills and resources to quickly catch up.
For example if we threw money at a group in 1905 do you think they could have come up with special relativity, or do you believe that it required geniuses working on the problem to have a breakthrough.
People with domain expertise in software are going to be amplified 10x using ChatGPT and curating the results. Likewise with any field that ChatGPT has adequate training data in. Further models will be created that are more specialized to specific fields that way their prediction model spews out things that are much more sophisticated and useful
I think you're right. I noted on another thread that I got ChatGPT to produce a mostly right DNS server in ~10 minutes that it took me just a couple of corrections to make work.
It worked great for that task, because I've written a DNS server before (a simple one) and I've read the RFCs, so it was easy for me to find the few small bugs without resorting to a line by line cross-check with a spec that might have been unfamiliar to others.
I expect using it to spit out boilerplate for things you could do just as well yourself will be a lot more helpful than using it to try to avoid researching new stuff (though you might well be able to use it to help summarise and provide restatements of difficult bits to speed up your research/learning as well).
For new technologies coming out it won't be effective until newer models are made.
Notice how I said it's going to make developers with existing domain knowledge faster.
But even to your point, I've never used Excel VBA before and I had ChatGPT generate some VBA macros to move data with specific headers and labels from one sheet to another and it wrote a script to do exactly that for me in ~1 minute, and just reading what it wrote it's immediately helping me clearly understand how it works. The scripts also work.
The computer science and server infrastructure technology fundamental background is what matters. Then the implementations will be quickly understandable by those that use it.
I asked it to make a 2D fighting game in Phaser 3 and specified what animations it will be using, the controls each player will have, the fact that there's a background with X name, what each of the moves do to the momentum of each player, and the type of collisions it will do and it spat out something in ~15 minutes (mainly because of all the 'continue' commands I had to give) that gets all the major bullet points right and I just have to tweak it a bit to make it functional. The moves are simplified of course but uhh yeah. This is kinda insane. I think you can be hyper specific about even complex technology and as long as there has been good history of it online in github and stack overflow and documentation it will give you something useful quickly.
It isn't. My exact point was that it isn't and accordingly ChatGPT produces the best benefits for someone who has already done 1, 2, 3 for a given subject.
It was in agreement with the comment above that suggest people with domain expertise will be faster with it.
In those cases, ChatGPT will do 4 far faster, and 5 will be little different.
How often has the solution to a business problem you faced been "write a simple DNS server"? Or are you claiming that it produced a fully featured and world-scale fast DNS server?
Several times. If that was the only thing I got it to do it wouldn't be very interesting, but that it answered the first problem I threw at it and several subsequent expansions with quite decent code was.
Writing a "world-scale fast DNS server" is a near trivial problem if what you look up in is fast to query. Most people don't know that, because most people don't know how simple the protocol is. As such it's surprisingly versatile. E.g. want to write a custom service-discovery mechanism? Providing a DNS frontend is easy.
How that domain knowledge interacts with ChatGPT's "mostly right" output was the point of my comment, not specifically a DNS server. If you need to implement something you know well enough, odds are ChatGPT can produce a reasonable outline of it that is fast for someone who already knows the domain well enough to know what is wrong with, and what needs to be refined.
E.g. for fun I asked it right now to produce a web server that supports the Ruby "Rack" interface that pretty much all Ruby frameworks supports. It output one that pretty much would work, but had plenty of flaws that are obvious to anyone versed in the HTTP spec (biggest ones: what it output was single threaded, and the HTTP parser is too lax). As a starting point for someone unaware of the spec it'd be awful, because they wouldn't know what to look for. As a starting point for someone who has read the spec, it's easy enough to ask for refinements ("split the request parsing from the previous answer into a separate method"; "make the previous answer multi-threaded" - I tried them; fascinatingly, when I asked it to make it multi-threaded it spit out a better request parsing function, likely because it then started looking more like Rack integrations it's "seen" during training; it ran on the first try, btw. and served up requests just fine).
EDIT: Took just "Make it work with Sinatra" followed by fixing a tiny issue by asking to "Add support for rack.input" to get to a version that could actually serve up a basic Sinatra app.
Domain knowledge resolves into intuition about solving particular types of problems. All ChatGPT can do about that is offer best guess approximations of what is already out there in the training corpus. I doubt very much that this exercise is anything but wasted time, so I think that people with domain knowledge (in a non trivial domain) are using ChatGPT instead of applying that knowledge, they are basically wasting time 10x not being more productive.
Expertise in software is about understanding the problem domain, understanding the constraints imposed by the hardware, understanding how to translate business logic to code. None of these are significantly helped by AI code assistance, as they currently exist. The AI only helps with the coding part, usually helping generate boilerplate tailored to your code. That may help 1.1x your productivity, but nowhere near 10x.
I'm surprised you haven't been able to leverage the AI for the analysis of a problem domain and constraints in order to engineer a novel solution. This is generally what I use it for, and not actual code generation.
What, precisely, about (1) is "simply wrong"? You've made a prediction about the usefulness of ChatGPT, but you haven't described why it's wrong to analogize GPT-type models to the language center of a brain.
"To put it in code assistant terms, I expect people to be increasingly amazed at how well they seem to be coding, until you put the results together at scale and realize that while it kinda, sorta works, it is a new type of never-before-seen crap code that nobody can or will be able to debug short of throwing it away and starting over."
I expect ChatGPT to be in a sort of equivalent of the uncanny valley, where any professional who gets to the point that they can routinely use it will also be in a constant war with their own brain to remind it that the output must be carefully checked. In some ways, the 99.99% reliable process used at scale is more dangerous than the 50% reliable process; everyone can see the latter needs help. It's the former where it's so very, very tempting to just let it go.
I'm not saying ChatGPT is 99.99% reliable, just using some numbers for concreteness.
If you were setting out to design an AI that would slip the maximum amount of error into exactly the places human brains don't want to look, it would look like ChatGPT. You can see this in the way that as far as I know, literally all the ads for GPT-like search technologies included significant errors in their ad copy, which you would think everyone involved would have every motivation to error check. This is not merely a "ha ha, silly humans" story... this means something. In a weird sort of way it is a testament to the technology... no sarcasm! But it makes it dangerous for human brains.
Human brains are machines for not spending energy on cognitive tasks. They are very good at it, in all senses of the phrase. We get very good bang-for-the-buck with our shortcuts in the real world. But GPT techs are going to make it really, really easy to not spend the energy to check after a little while.
This is a known problem with human brains. How many people can tell the story of what may be the closest human equivalent, where they got some intern, paid a ton of attention to them for the first two weeks, got to the point where they flipped the "OK they're good now" bit on them, and then came back to a complete and utter clusterfuck at the end of their internship because the supervisor got "too lazy" (although there's more judgment in that phrase than I like, this is a brain thing you couldn't survive without, not just "laziness") to check everything closely enough? They may even have been glancing at the PRs the whole time and only put together how bad the mess is at the end.
I'm not going to invite a technology like this into my life. The next generation, we'll see when it gets here. But GPT is very scary because its in the AI uncanny valley... very good, very good at hiding the problems from human brains, and not quite good enough to actually do the job.
And you know, since we're not talking theory here, we'll be running this experiment in the real world. You use ChatGPT to build your code, and I won't. You and I personally of course won't be comparing notes, but as a group, we sure will be. I absolutely agree there will be a point where ChatGPT seems to be pulling ahead in the productivity curve in a short term, but I predict that won't hold and it will turn net negative at some point. But I don't know right now, any more than you do. We can but put our metaphorical money down and see how the metaphorical chips fall.
The question I have is whether the tools to moderate ChatGPT and correct its' wrong answers should be in place for humans anyway. It's not like human workers are 100% reliable processes, and in some cases we scale human work to dangerous levels.
Ultimately, the best way to make sure an answer is correct is to come to it from multiple directions. If we use GPT and other AI models as another direction it seems like a strict win to me.
Hmm, by multiple directions, what I really meant is, consult a model trained in 2 vastly different ways, or consult a human + a model, or consult a model + a calculator, or a model + human-written test cases. He didn't really go into the idea of "if 2 independent processes can arrive at the same factual statement, it's probably the truth" angle here.
Of course, 2 independent processes is still a tough thing to design when you require this much data, and the data is probably coming from web-scraping, so I think for now, human-generated test cases are still needed.
I mean factual independence likely works for math and repeatable science, but this starts to break down because a lot of interesting things are historical and determining actual independence is very difficult.
For example the openAI model and people working for openAI submitting are not what I consider independent. This goes the same for any human involved processes. For example if you're talking about a religion the most vocal, and most likely for input are most likely to be very for or very against it, and no rational middle ground may even exist.
The context of this discussion was a coding question: would you use LLMs as a coding assistant? In general, this method works for things where there's an objective desired result, which includes most coding applications.
1. We are so amazed by its ability to babble in a confident manner that we are asking it to do things that it should not be asked to do. GPT is basically the language portion of your brain. The language portion of your brain does not do logic. It does not do analyses. But if you built something very like it and asked it to try, it might give it a good go.
In its current state, you really shouldn't rely on it for anything. But people will, and as the complement of the Wile E. Coyote effect, I think we're going to see a lot of people not realize they've run off the cliff, crashed into several rocks on the way down, and have burst into flames, until after they do it several dozen times. Only then will they look back to realize what a cockup they've made depending on these GPT-line AIs.
To put it in code assistant terms, I expect people to be increasingly amazed at how well they seem to be coding, until you put the results together at scale and realize that while it kinda, sorta works, it is a new type of never-before-seen crap code that nobody can or will be able to debug short of throwing it away and starting over.
This is not because GPT is broken. It is because what it is is not correctly related to what we are asking it to do.
2. My second conclusion is that this hype train is going to crash and sour people quite badly on "AI", because of the pervasive belief I have seen even here on HN that this GPT line of AIs is AI. Many people believe that this is the beginning and the end of AI, that anything true of interacting with GPT is true of AIs in general, etc.
So people are going to be even more blindsided when someone develops an AI that uses GPT as its language comprehension component, but does this higher level stuff that we actually want sitting on top of it. Because in my opinion, it's pretty clear that GPT is producing an amazing level of comprehension of what a series of words means. The problem is, that's all it is really doing. This accomplishment should not be understated. It just happen to be the fact that we're basically abusing it in its current form.
What it's going to do as a part of an AI, rather than the whole thing, is going to be amazing. This is certainly one of the hard problems of building a "real AI" that is, at least to a first approximation, solved. Holy crap, what times we live in.
But we do not have this AI yet, even though we think we do.