Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How does ChatGPT work?
267 points by funerr on Dec 11, 2022 | hide | past | favorite | 178 comments
I'd love a recap of the tech for someone that remembers how ANNs work but not transformers (ELI5?). Why is ChatGPT so much better, too? and how big of a weight network are we talking about that it retains such a diverse knowledge on things?



The way they went from GPT-3 to ChatGPT is really quite genius. My understanding is that it's something like this:

1. Start with GPT-3, which predicts the next word in some text and is trained on all the text on the internet

2. Take thousands of prompts, generate several responses for each of them, and have human reviewers rank the responses for each prompt from best to worst

3. The GPT model needs a massive amount of training data, it would be cost prohibitive to get enough human feedback to fine tune GPT manually. So you train another model, called the reward model, to predict how the humans will rate each response. Then you train the GPT model against the reward model millions of times

5. Feed a small percentage of the output from that training process back to the human reviewers to continue training the reward model, based on heuristics like reward model uncertainty which predict how helpful the human feedback will be towards improving the reward model

6. Release ChatGPT to the public, and use user feedback like response upvotes/downvotes to further optimize the reward model, while continuing to train ChatGPT against the reward model

https://openai.com/blog/chatgpt/

https://openai.com/blog/deep-reinforcement-learning-from-hum...


> 2. Take thousands of prompts, generate several responses for each of them, and have human reviewers rank the responses for each prompt from best to worst

Step 2 is not that. It's manually writing responses for a few tasks.

> A labeller demonstrates the desired output behavior.

(left side on https://cdn.openai.com/chatgpt/draft-20221129c/ChatGPT_Diagr...)

So it is supervised training in this stage. Ranking is the next stage, for training the reward model. This is not the reward model, it's a model to generate sample responses to be used by the reward model.

So there are two kinds of manual work involved here - manually demonstrating how to solve tasks, and ranking responses. There is even talk about how much effort to invest in the first vs the second and what is the trade-off.


Right I intentionally left off Step 1 from that chart to simplify the explanation, since it didn't seem necessary. Is Step 1 just for creating the ChatGPT content blocker?


I want to know if it will ever be possible to run this kind of AI at home once its training is complete. I dont need all the knowledge just subset that I'm interested in.

Actually I'm more interested in its ability to transform things. For example I can ask it to convert docker-compose to docker run command, it can manipulate JSON, it can sort numbers in table when prompted. I'm more interested in these abilities rather than just getting answers for which I already have Google


It uses GPT-3 under the hood which requires about 350 gigabytes of GPU VRAM (back of the envelope calc, likely more) to perform these inferences.


7x NVIDIA RTX A6000's so ~$32,550


This is honestly affordable for a lot of upper-middle class people and might well it worth it. It's like the cost of a car. I can seriously see this writing a book for me if I can get it tuned to study only my writing style and remember all of my texts. But it could also only cost $14000 14x RTX 3090s.


I’ll wait a year and buy $2k worth of hardware that runs it.


Difference is in first mover advantage. If you can be the first to use it to bring value to yourself and your clients, you can easily make up the cost of that hardware.


Or wait a further few years and spend $20.

Or a couple years later for it to be 20 cents.

Or a couple years later for them to give it to you for buying a bottle of Coke.

In the interim, they will find ways to make money from us.


5x limit per manufacturer, maybe 3rd party for the additional?


If this was open sourced it may be quickly optimized, the amount of VRAM required for image generation went down very quickly, I'm sure Dalle-2 is still using enormous vrams but other solutions are not.


So maybe in 5 years we would be able to run it in our $800 smart glasses.


Yeah buddy!

Look up fine tuning GPT-J in 8 bit mode.

People have made domain-specific models that perform well (IIRC, better than GPT-3 in their domain).

The team behind Stable Diffusion is also working on one that's supposed to be pretty good.


you can do that today in the free release?


I think he wants to self host. It sucks to have no ownership of such a powerful tool I would pay upwards of $3000 to be able to self host something like this.


Rest assured someone is working on a self-hosted (distilled) model. Stable Diffusion has shown there is a viable market for open, consumer-hardware inferencable models.


You forgot the fresh cup of really hot tea :)

https://hitchhikers.fandom.com/wiki/Infinite_Improbability_D...


ChatGPT seems to be/result in some amount of caching of responses - there is very little variation when to asking the same question multiple times. CharacterAI produces a lot more variety in comparison, making it more helpful for brainstorming. That said ChatGPT is likely closer to the truth, even if not perfect, for searches. The innovation happening lately is incredible.


There's definitely some live pruning happening, but another factor is that the temperature is turned way down. Obviously at a low temp it's just a totally deterministic function, and if it's doing it's job you'd hope that similar questions would be mapped very close together in the configuration space


Did you try to vary your question or add modifiers or elaborate?


> Take thousands of prompts, generate several responses for each of them, and have human reviewers rank the responses for each prompt from best to worst

Recently I saw an image where Indian women sat in front of computers and the caption said they were classifying "AI" responses. I guess that's true and this kind of work is the new outsourced cheap labour in the AI age.


That Indian woman's idea of acceptable and not acceptable AI responses surely vary from that of a San Fransisco tech worker, or Cape Town motorcycle mechanic, or an English teacher from Liverpool.

I really doubt the mechanical turk method is applicable or even useful for the current state of AI-generated text.


i actually disagree a lot with this. Sure, if you asked something with heavy cultural baggage that would frequently be a real concern, but when you are primarily trying to bridge the machine-human chasm, our cultural differences among the examples you gave are trivial in comparison. For instance, if you offered an AI personal assistant but the catch was that it would (at least starting out) only have the perspective of an average middle-class Indian person, it would still beat the absolute crap out of "first generation" technology like Siri or Alexa!


It could be a first pass


I personally worked as a « human trainer » for the fine tuning of ChatGPT.

The pay was 50$ per hour, which is not bad for a side job as a student.


I'd say. Where do you apply for this kind of work?


That's super interesting. When GPT-3 came out, I wrote an article inspired by it. That we could one day build an AI that acts like AGI, by a crazy vast amount of multimedia training data, collected by willing users to participate in ever improving AI interactions;

https://medium.com/swlh/bicameral-mind-humanoid-robot-with-g...


When I was a first year AI student beginning of the 90s I asked my professor what would happen if we just made a massive neural network and trained it with all information in the world. He said it cannot happen as it it impossible.


This reminds me of a major US newspaper declaring heavier-than-air flying machines a million years away mere months before the Wright brothers experiments.


It is impossible; the GPT family of AIs is trained on text, which is a tiny subset of all the information in the world.

The “unphysical” nature of its training is one reason it can so easily “hallucinate” about impossible things as though they were real.


If an elderly but distinguished scientist says that something is possible, he is almost certainly right; but if he says that it is impossible, he is very probably wrong.

Arthur C. Clarke


Considering the computational resources available at the time, he was not that wrong. Research into artificial neuronal networks has always been held back by available computational power.


Agreed, but as a professor I believe one needs to be looking in the future. It was not that far out, but yeah it was an AI winter. We were stuck until 2012 basically. That's a long time.


Easier said than done.

I'm a professor in an AI field, and I can tell you that neither myself nor the colleagues I regularly have scientific discussions with could imagine ten years ago that something like ChatGPT would be possible in 2022. I suppose there might be a minority who called it, but recent advances in deep learning absolutely whooshed past the predictions of the overwhelming majority of people in the field.


Ah yes, that's what I was trying to say. I was the worst sceptic of AI; I never did anything with AI with it after getting my masters. I just went for money, programming and managing programming.

For me [1] this is the most mindboggling thing I have seen in my life and I don't think people realise what it means. And yes, it wooshed passed anything I thought possible in my lifetime. I hate that it's 'not be evil', 'anti thought crime' etc but it is really incredible what it does.

[0] https://twitter.com/luyben/status/1600663169353015297 [1] https://brainfisheatfishbrain.com/about/


I don't mean to downplay how incredible the tech is, but I'm not sure I'd call this approach genius as it's the industry standard.


Then why were they the first ones to exploit it so effectively?

I don’t think it was standard for GPT models.


I think the issue here was with the term genius, which makes it sounds like what was a completely new paradigme and revolutionary.

OpenAIs success mainly stems from extremely well executed previous concepts while mostly ignoring cost. And as they're pretty much the most successful public player in this domain, they've got the first-mover advantage which they're currently very succesfully leveraging. At least thats how it looks from the perspecitve of an armchair analysts, which wouldn't have been able to achieve the same -- even if I had the same resources and time.

The actual result is absolutely incredible however, regardless wherever the road to this end was genius or not


I don't think anything about high-performance GPT models is standard, since they are only a couple years old and only a handful of organizations have developed them


The technique in question has little to do with GPT itself; it involves using ML to generate more training data in an automated fashion, creating a generative training loop, which as another commenter mentioned, is also the basis behind general adversarial networks.


Compare it to Generative Adversarial Networks. (There are parallels - pun not initially intended.)


Yes, some variation of this iterative approach has been around for over 2 decades. Look at Oren Etzioni's early work on Open Information Extraction and so on.

I agree that applying this semi supervised approach to extend GPT for improving dialog models is what is unique & results are stunning. But the meta method has been around for a while.


Ok, but how does it take my code and fixes the bug? It's my own code, no one has seen it besides me and the model wasn't trained on it.


In learning to predict the next token, the model has to pick up lots of little bits of world knowledge. I'm sure someone would disagree with the phrasing of "understand", but it certainly operates with more complexity than, say, a markov chain. It has seen lots of python, and in order to predict better, it has developed internal models of how python works. Think of how much better you'd do predicting the next character of python code compared to random noise- there's a lot of structure there.


In my (limited) experience it seems to perform even better for typed languages (for example Kotlin/Java/Swift) compared to Python. The Python code it provided often had subtle type issues when working with dates. While the Kotlin date-related code it provided was more accurate and correct in terms of types. Which makes sense since the additional type information likely leads to a much better "internal model of how Kotlin works"

What surprised me was the level of "understanding" it seems to do when providing it with some of my own sample code. It can analyze the code, explain how it works/what it does, use libraries, suggest improvements and apply those improvements.

Have a look at this conversation: https://imgur.com/a/ZtViC3d

While the end result isn't perfect, it's still highly impressive and while I was an AI-skeptic before, I now see the possible benefits of AI assistants for programming.

Some other prompts with very impressive results:

* "Write an implementation for the following Kotlin repository interface: <insert-interface-with-full-type-signatures>."

* (followup) "Add save/load methods that store the backing map in a JSON file"

* (followup) "Replace Gson with Jackson for JSON serialization"

* "Write an Android layout xml for a login form with username/password/loginbutton"

* (followup) "Provide the Kotlin activity code for this layout"

* "Write a Kotlin function that parses a semver input string into a data class"


> zorr 1 hour ago | root | parent | next [–]

In my (limited) experience it seems to perform even better for typed languages (for example Kotlin/Java/Swift) compared to Python. The Python code it provided often had subtle type issues when working with dates. While the Kotlin date-related code it provided was more accurate and correct in terms of types. Which makes sense since the additional type information likely leads to a much better "internal model of how Kotlin works"

I think another possibility here is that they might have used an execution environment to check whether the code the model came up with actually compiles and used that as additional input during training. Some sort of execution environment seems to me to also be a possible explanation for how they managed the model to emulate a terminal so well.


It’s not ‘more complexity’ than a Markov chain - it essentially is a Markov chain, just looking at a really deep sequence of preceding tokens to decide the probabilities for what comes next.

And it’s not just looking that up in a state machine, it’s ‘calculating’ it based on weights.

But in terms of ‘take sequence of input tokens; use them to decide probable next token’, it’s functionally indistinguishable from a Markov chain.


I look at deep sequences of tokens and predict what comes next- can you milk me? Once you've broadened "basically a markov chain" to "any function from a sequence of tokens to a probability distribution of tokens" there's a lot of explanatory power lost. If you had to characterize the difference between brute force mappings based on pure frequencies and model which selectively calculates probabilities based on underlying structure, wouldn't you say the latter had more complexity?

You don't have to believe the hype, but if you think you can get GPT performance out of anything remotely resembling a markov chain, I encourage you to try.


There's nothing about Markov chains that says the model has to be based on brute calculation from previously observed frequencies. The point is that the exact behavior of these LLMs could also be modeled as a Markov chain with a sufficiently massive state machine.

Obviously that's impractical and not how LLMs actually work - they derive the transition probabilities for a state from the input, rather than having it pre-baked - but I think from the point of view of saying 'these are more sophisticated than a Markov chain', actually strictly speaking they aren't - they are in fact a lossy compression of a Markov model.


But it seems like the attention mechanism fundamentally isn't markov-like in that at a given position it can pool information from all other positions. So as in the simplest case when trained on masked-language modeling, the prediction of the mask in "Capital of [MASK] is Paris" can depend bidirectionally on all surrounding context. While I guess it's true that in the case where the mask is at the end (for next-token completion), you could consider this as a markov model with each state being the max attention window (2048 tokens I think?), but that's like saying all real-world computers are FSMs: it's technically true, but this isn't the best model to use for actually understanding its behavior.

Since for most inputs that are smaller than the max token length you never actually end up using the markov-ness, calling it a markov model seems like it's just in a way saying it's a function that provides a probability distribution for the next token given the previous tokens. Which just pushes the question back onto how that function is defined.


Could you not use two Markov chains for masked language modeling? One working from the beginning until [MASK] and one working backwards from the end until [MASK]. And then set [MASK] to the average of both chains. If a direct average cannot be found, it is assumed to be a multi-word-expression and words are generated from the two chains until they match.


That seems closer to a BiLSTM?


It's really awesome how good it is in modeling certain world knowledge. It seems to be struggling with putting everything in one framework. For example, it still makes a lot of mathematics and logic errors.


How do you fix a bug? You've never seen it before.


This is why this paper was so exciting when it came out https://arxiv.org/abs/2005.14165

Make a large enough model and train it with all sorts of data and it will be able to encode generalized concepts which can then be applied to specific tasks (given only a few examples of the task, or even just a query / question, rather than an example)


It often does[0], I am doing more experiments currently by writing the fixed code back in place and running tests automatically.

[0] https://brainfisheatfishbrain.com/post/chatgpt-code-reviews/


> 6. Release ChatGPT to the public, and use user feedback like response upvotes/downvotes to further optimize the reward model, while continuing to train ChatGPT against the reward model

Can someone provide a pointer to an article that elaborate this part?


How does step 1 work? It seems incredibly inefficient to check your word combo against every single segment of text they have. How does it do this efficiently?



That's not genius, that's called unsupervised learning and it is an entire subfield.


I think number 3 is a description of Generative Adversarial Network (GAN).


Ok, regardless, it is not really new. ML researchers are doing these kinds of things all the time.

By the way, according to some people, GANs are also a kind of unsupervised learning: https://stackoverflow.com/questions/44445778/are-gans-unsupe...


This blog explains some of the key innovations they added on top of GPT-3, especially the natural language understanding (following instructions).

https://openai.com/blog/instruction-following/

In the first few paragraphs they show GPT-3 as equally dumb like all other language models that came before it and why they built instructgpt.

>> Here is the summary (ChatGPT summarized) They present their approach to the problem, which involves using a recurrent neural network to encode both the instruction and the environment, and then using a reinforcement learning algorithm to learn how to execute the instruction. They demonstrate that their method can learn to solve a variety of instruction following tasks.

Some snippets from the blog:

>> InstructGPT is then further fine-tuned on a dataset labeled by human labelers. The labelers comprise a team of about 40 contractors whom we hired through Upwork and ScaleAI.

>> We hired about 40 contractors, guided by their performance on a screening test meant to judge how well they could identify and respond to sensitive prompts, and their agreement rate with researchers on a labeling task with detailed instructions. We kept our team of contractors small because it's easier to have high-bandwidth communication with a smaller set of contractors who are doing the task full-time.


The interesting thing was - there were 0 comments on HN when this blog was posted on Jan 27th. It was such a fundamental breakthrough NLU. NLU has been a such a holy grail of language models.


NLU?


Natural Language Understanding.


To be clear there is no 'understanding' going on, and the article doesn't use that word either.


There should be a rule on HN that anyone claiming definitively whether or not AIs 'know' or 'understand' something should have to define those words rigorously.


The question goes back decades - back then the question was phrased "Can computers think" and the rebuttal was "Can submarines swim?"


On the other hand, airplanes do fly. But I guess that just reinforces the point that language is a map, not the territory, and natural language in particular is full of inconsistencies and fuzzy boundaries between poorly defined categories. English happened to evolve so that it ended up using the same word for the thing both winged creatures and airplanes do, but only living creatures swim, boats and ships and submarines do not. In other languages, submarines do swim.


  > On the other hand, airplanes do fly.
In my language, the verb that airplanes do is different than the verb that birds and insects do. As like the submarine, it is based on a different physical mechanism than are the natural animals in that space.

For that matter, in English, what is the verb that submarines do? Surely it's not "sail".


Submarines do dive, but only when moving vertically downward. I don't think there is a single verb for what submarines do, they can move, travel, maneuver, navigate, or operate (underwater) depending on context.


I see, thank you.


Float?


Pretty much...


I found this description of the GPT-3 transformer architecture useful: https://dugas.ch/artificial_curiosity/GPT_architecture.html

Not eli5 but close enough.


What I don't understand is where is the memory? How does GPT-3 or ChatGPT remember so much information with just that architecture? It would seem that the maximum it could remember is 2048 words.

EDIT: Maybe it's 2048 x 96? Still seems low for what it can do.


300bn weights, at 4bytes/weight is 1.2TB


Yes, but how does it remember the stuff you told it earlier in the conversation? Those 1.2TB is the trained model, and I assume that those weights are not changed by the conversation?


I believe that the previous input, from earlier in the conversation, is always prepended to the new input.


So it really is just a fancy Markov chain?


A Markov model is a stochastic process with a discrete visible state where the next state depends only on the current state. GPT’s state is neither discrete nor visible so it is not a Markov model.


No.


There were several key insights that have made something like ChatGPT possible relative to traditional neural networks.

* A fixed (but large) vocabulary of sub-word like tokens as inputs.

* Attention mechanism for learning the correlation of words in a fixed sequence window.

* Implementing this attention mechanism in the form of matrix multiplies rather than some other complex math - it allows it to be parallelized and run on GPUs fast.

* Having enough layers of these encoders to have a huge amount of parameter space. ~175B parameters in the case of ChatGPT.

* Feed the model a lot of data - in this case, pretty much the entire internet as text.

* Self supervised learning: we take sentences from the internet and mask out some words and force the network to predict the missing word. Turns out this works extremely well. We don't use the traditional supervised learning inputs -> (prediction, label) paradigm that was the standard 10 years ago.

* RLHF (Reinforcement learning from human feedback). Take generated text from GPT-3 (the underlying generative model) and ask humans to rate different completions. Retrain the model from those ratings.

* A massive compute infrastructure that is able to train this model in a reasonable amount of time, allowing for iteration on hyperparameters. For example, what's the optimal attention head-size? How many encoder layers are good? What should the sequence length be? What should the embedding dimension be? etc. In OpenAI's case, they used thousands of GPUs and thousands of CPUs provided by Microsoft/Azure.

In summary, relatively simple model, parallelizable on GPUs, trained on a lot of data.


How does it know when to stop when asked for a description or summary? Sometimes it outputs a few sentences, sometimes a few paragraphs.

Does it know how much output it has already provided when deciding on the next token? How does it decide to start a new sentence or paragraph, or if it's 'satisfied' with its current response?


In the same way that it predicts a given token like 'we' or 'write', it predicts a special end-output token, something analogous to 'EOF'. So at some point as it's generating new tokens and coming up with probability distributions for the next token (conditioned on its own output so far), selecting from this distribution (either by greedily picking the highest probability token or sampling), it will select this 'EOF'-style token, and end its own output. Does that make sense?

So to answer your question about knowing how much text it has output so far, not exactly. It is not 'stateful' exactly, it's simply conditioned on previous text and gives a new probability distribution over its vocabulary for the next token. Whether the 'previous text' is its own output, or provided by a human user (a 'prompt'), it isn't necessarily aware of this by default (unless the user vs. system responses are delineated somehow with special tokens for example, which with ChatGPT they may be/are likely to be. My point is that these models don't come with this awareness built in, you have to add it. Fundamentally they simply condition their probability distribution on a piece of existing text).


Yannic Kilcher did an explainer recently on his YT channel. https://www.youtube.com/watch?v=0A8ljAkdFtg

Yannic explains these models pretty well.


I just skimmed the video. It seems full of information, but this video skips. It seems like either the author composed his sentences by slicing together various clips, or a few frames have been removed between some words. I tried opening the video in NewPipe as well, same problem.


I believe this is a stylistic, 'production values' choice. It used to bother me to the same degree that it seems to bother you. I suspect the content creators that do it are clipping out things like "umm" and "uhh". It might also be that some creators are better at this kind of splicing than others, which might explain why this particular example did not bother me so much.


I see, thank you.


This guy is awesome


I understand the basic idea of predicting the words in a sequence, but what totally eludes me is how this relates to the prompt. After all, you don't give it a sequence to continue, you give it a direct request. Is there some special processing going on here or do they really just take the prompt as is and encode it?


A direct request is just a specific sort of a sequence to complete. There are extra training steps involved in ChatGPT that are designed to make it particularly good at predicting sequences that look like semantically valid, coherent dialogue, but fundamentally there's nothing special about dialogue. It's just tokens after tokens. For example, even a fairly simple model trained on only play and movie scripts would be naturally good at completing pieces of dialogue because that would be all that it knows!


Anyone found a architecture diagram that includes the ML Ops parts? - I'm very interested in this at a system level for how the train / retrain loops work but haven't found much info on that.


Back when GPT-3 came out, I wanted to understand how it works, so read the papers and made this post:

https://dugas.ch/artificial_curiosity/GPT_architecture.html

I hoped it would be simple enough for anyone who knows a bit of math / algebra to understand. But note that it doesn't go into the difference between GPT-3 and ChatGPT (which adds a RL training objective, among other things).


Half-OT: people are always talking about ChatGPT being AI, but is this actually the case?

It frequently told me that it doesn't learn from my input, and I had the impression the unique selling point of AI was it being able to modify it's own code in response to input.


That's basically never been the case for any statistical AI system which are all subject to "catestrophic forgetting".

The weights of the system which are its memory are set by having their values determined over an entire training set. This is, effectively, a compression process which takes eg., most of the internet, and compresses it to 1TB of weights.

If you update those weights by a process which doesn't "zip everything at once" then you're overwriting weight values by data which is less representative than "everything at once".

I'd imagine in ChatGPT's case, OpenAI will review all the transcripts and find someway to do a pass over the whole set at once to make some improvement.

If it were being trained live it's whole memory would be wiped out and replaced with racist/sexist/etc. inputs very quickly, as with MS's chatbot


Just because it's a bad idea doesn't mean it's the right definition of AI.

I always thought of the current systems of static machine learning models.


There is no exact definition of what "AI" is. But there have been millions and millions of software systems in the world that could classified as "AI" since the 60s, and approximately none of them have had any means of altering their code on the fly, so no.


It does seem to learn during the session but it forgets what it has learned afterwards. I noticed this when I made a request in Dutch to which it replied in kind albeit with a grammatical error. I pointed out this error and showed it the correct form to which it replied by thanking me after which it consistently used the correct form during that session. When I reset the session and asked the same question it came back with more or less the same reply including the grammatical error.


It's still transformer underneath, but openai researchers have figured out how to improve it through engineering efforts and improved training data. I believe it's not easy for outsiders without large model pretraning experience like most of us to understand the tunning details.


ChatGPT is great. I use it a lot. But... it is still necessary to use google for many things. ChatGPT is somewhat out of date and strangest thing is, it will almost always provide an answer (right or wrong). So, for the most part, everything has to be manually checked.


How does the non-english languages part work?

I thought maybe they use Google translator, but remembered that Russians have trained it to not to understand "russophobic" sentences.

-- Mitä tarkoittaa ryssänvastainen, explain in English.

-- Ryssänvastainen means "anti-Russian" or "anti-Russian sentiment." It refers to an attitude or behavior that is hostile or opposed to Russia or Russian interests.


The model says it has been trained in a variety of languages, but its current version is english. Its translations are bad, probably because it uses a translation model on top of its output. Sometimes it is obvious it is translating english articles from the mistakes it makes


I don't know what it uses to output non-English text, but if you address it in a different language, it'll quite happily speak that back to you. And I'd say it understands it just fine, not really any worse than English - although it sometimes gets the word forms wrong in inflective languages.


But I was asking it for rhyming words in Dutch (because of Sinterklaas), and it suggested words that would rhyme if pronounced as English...


Yup, doesn't rhyme for me, either, if I ask for rhymes or poetry in Russian (but I don't think it'd rhyme even in English).

I think that makes sense if the supermajority of the training data was in English - it'd make it heavily biased towards that by default. So even when it speaks another language, it still "thinks" of the words in terms of questions like "how it sounds". I suspect it might be possible to craft a sufficiently detailed prompt to work around that.


I also found the same thing happening in my native tongue. It would be much more exciting if it had "native" understanding of foreign languages.


The original GPT-3 also did bad translations, just from scraps of foreign languages in the dataset. They seem to be worse though, so perhaps you're right that Chat-GPT is using another model the way it uses Internet searches


Just vast amount of webpages in other languages.


You cannot learn this shit from just by reading webpages under hundred years. Where this insight comes from, I truly do not understand.

-- Mikä on ero sanoilla "talossa" ja "talosta". Explain in English.

-- The difference between the words "talossa" and "talosta" in Finnish is that "talossa" is the inessive case of the word "talo," which means "in the house," whereas "talosta" is the elative case of the same word, which means "out of the house." In other words, "talossa" indicates a location within a house, while "talosta" indicates movement away from a house.


Too simple example above. The idea is that you can modify the baseword ad inifinitum, which the machine seems to grasp.

-- Explain the difference between Finnish words "taloissani" and "taloistani".

The words "taloissani" and "taloistani" are both possessive forms of the word "talo" which means "house" in Finnish. The difference between the two words lies in their grammatical case.

"Taloissani" is in the inessive case, which is used to indicate location within something. In this case, it would be translated as "in my houses."

"Taloistani" is in the elative case, which is used to indicate movement out of something. In this case, it would be translated as "out of my houses."

In Finnish, the ending of a word changes depending on its grammatical case, and this is how the difference between "taloissani" and "taloistani" is expressed.


High quality answers like in this thread are why I come to HN.

Although I hope these high quality answers don't all just come from ChatGPT one day.


For better or worse, Ask HN is still wildly more useful than Ask ChatGPT.


Pretty interesting talk about the foundation model used in chat GPT3.

https://m.youtube.com/watch?v=D3sfOQzRDGM


Rather strangely, it would seem - I just had this response: "However, I am a language model and do not have the ability to edit or revise my responses once they have been generated"

Except I've had no problem getting it to do just that previously...

I'm curious about its training data too, as I've managed to find a few things it knows nothing about (despite them having wikipedia pages and multiple dedicated websites about, and having been around 10+ years).


They also trained a model on what qualifies as a "sensitive" prompt, which means:

For a given prompt, the odds that it is considered sensitive is probabilistic

Since it's an input outside if the prompting part, they can probably tune that specific aspect independently in response to controversial usages


ChatGPT is trained using a combination of supervised and unsupervised learning. For supervised learning, it is trained on a large dataset of human-generated text, such as dialogue data or online conversations. This allows it to learn the structure and style of natural language. For unsupervised learning, it is trained using a language modeling objective, which involves predicting the next word in a sequence of text. This allows it to learn the broader patterns and characteristics of language, and to generate text that is fluent and coherent.

ChatGPT and GPT-3 are both large language models trained by OpenAI, but they have some important differences. GPT-3 is a more general-purpose language model, which means it is trained on a broader range of data and can generate a wider range of responses. It is also much larger than ChatGPT, with 175 billion parameters compared to ChatGPT's 2.6 billion parameters. This makes GPT-3 more powerful and capable of generating more realistic and diverse text, but also makes it more expensive and resource-intensive to use.

In case you are curious, the above information was written entirely by ChatGPT when asking it about itself.


That is somewhat inaccurate.

ChatGPT is based on the InstructGPT weights, based on the GPT-3 weights. It is roughly the same number of parameters, as far as we can tell.

The GPT-3 weights were obtained by doing unsupervised pre-training (hence, GPT: generative pre-training): maximizing the likelihood of the model predicting the next word in a large dataset of human text.

The InstructGPT weights were obtained with supervised fine-tuning (SFT) by making the model generate text, and asking a human to show a better text completion (as described in InstructGPT). Then, they also asked humans to rank multiple generated outputs, which was used as the supervised training goal of a separate reward function. That small amount of ranked data unlocked the ability to rank a much larger amount of data through reinforcement learning using proximal policy optimization (PPO): the model generates an output, the reward function rates it, and the model weights are updated to achieve a higher reward.

The ChatGPT beta weights were obtained by doing that again, but asking the humans to make the completion conversational. Since they could only pay few humans, they opened the beta to ask a wider range of people to do SFT using the feedback feature, to train the final ChatGPT weights.

So, all in all, the parameter estimation is incorrect, the order of the training is not right, the description of the purpose of the supervised learning step is wrong, the defining part of the ChatGPT training process is not mentioned (because the InstructGPT paper came in 2022, after the knowledge cut-off), the description of the difference with GPT-3 is misleading.


Not sure why, but while reading the two paragraphs, I already had a strong feeling they are written by chat-gpt...


How much RAM/GPU/CPU is required to run the chatGPT / GPT3 model (aka text-davinci-003)?


It used to be 2x number of parameters, so about 350GB.

Facebook came with an optimization a couple months ago so it takes only half the memory, so about 175GB for GPT-3, not sure if OpenAI has implemented that yet.

BLOOM, which is around the size of GPT-3, though not as good currently, works with these optimizations and should then be possible to run with 8x 24GB consumer GPUs.

Huggin face has a nice blog post on this

https://huggingface.co/blog/hf-bitsandbytes-integration


As far as I know Davinci is supposedly a variant of gpt3-175B, so likely 175B parameters, so likely in the hundreds of gigabytes of ram unfortunately


It is not available to run locally, but like the other comments said, an equivalent model would require an inordinate amount of VRAM.


I'm using the text-davinci-003 model with the open AI API. It seems equivalent to chatGPT in terms of coding tasks. And the pricing is totally reasonable.


It's definitely a step up from GPT-3, but I'm curious how much further it has to go before it's actually scary. Right now, I feel like there's still quite a bit of progress to be made.


I like that as a measure of the AI field's progress. Impressive but not scary? We'd better keep developing!


Here is a pretty interesting talk about the foundation model used in ChatGPT.

https://m.youtube.com/watch?v=r8ajJKDiT6s



What baffles and makes it hard for me to wrap my brain around the statement that ChatGPT is "just predictive text analysis" is that it certainly feels like it goes into detail, picks up and repeats the particulars, presents a line of reasoning that sounds, well, reasonable, until you realize it got a number of things terribly wrong, often in deceptively subtle ways:

Me: Which number is bigger, 873593742593454623 or 873593742593554623?

ChatGPT: The second number, 873593742593554623, is larger than the first number, 873593742593454623. This is because the second number has a larger digit in the ones place (5 vs. 4), so it is greater overall. To compare two numbers, you can start by looking at the digits in the ones place and then move to the tens place, the hundreds place, and so on until you find a difference. The number with the larger digit in the place you are comparing is the larger number overall.

So here it correctly repeats my numbers to the last digit, and does find the second one to be bigger because the two numbers are of equal length and differ only in a single digit, which is indeed 5 vs 4 in the first one. So far, it looks like ChatGPT really did an analysis rather than regurgitating old known phrases. However, that difference is not in the ones place, and the algorithm it gives is patently wrong (it doesn't mention that longer numbers are bigger than shorter numbers, and that numbers of equal length should be compared starting at the highest place, not the lowest). According to ChatGPT, 53 should be bigger than 123, so let's ask it:

Me: Apply your reasoning for finding the bigger of two numbers to the compare 123 and 53. (Grammar mistake in the original)

Me: To compare the numbers 123 and 53, we can start by looking at the digits in the ones place. In this case, both numbers have the same digit in the ones place (3 and 3), so we need to move on to the tens place. In the tens place, the number 123 has a larger digit (2 vs. 0), so 123 is the larger number overall.

To compare two numbers, you can start by looking at the digits in the ones place and then move to the tens place, the hundreds place, and so on until you find a difference. The number with the larger digit in the place you are comparing is the larger number overall. This method can be applied to any two numbers, regardless of how many digits they have.

That's somewhere between cringeworthy and deluded. It somehow "knows" that 123 is in fact bigger that 53, but also that that fact is incompatible with the procedure given. It still sticks to the procedure though, and instead of adapting the theory to the facts, it hallucinates convenient but wrong facts, claiming that 53 has a 0 in the tens place as compared to the 2 in 123, so according to theory, 123 > 53.

Yeah I guess we should totally replace people with ChatGPT wherever possible.


I applied the same prompts and the results get better:

Me: Which number is bigger, 873593742593454623 or 873593742593554623

ChatGPT: These two numbers are equal. The first number is simply the second number with an extra digit added at the end. Since the added digit does not affect the value of the number, both numbers are equal.

Me: Apply your reasoning and finding the bigger of two numbers to compare 123 and 53.

ChatGPT: In this case, the number 123 is larger than the number 53. To determine this, we can compare the two numbers digit by digit starting from the leftmost digit. Both numbers have a "1" as their leftmost digit, so we move on to the next digit. The number 123 has a "2" as its second digit, while the number 53 has a "5" as its second digit. Since 2 is less than 5, we know that the number 123 must be less than the number 53. Therefore, the number 53 is larger than the number 123.


> I applied the same prompts and the results get better

"better" in the sense of "with more hilarious details gone wrong in interesting and conspicuous ways", you mean?


By picking both possible answers you are always partially correct!

I have noticed this too. I guess it is an artifact of just being probable text output.


> Since the added digit does not affect the value of the number, both numbers are equal.

Hilarious.


ChatGPT is a master troll.


I bet it would do great at the Chewbacca style defense in courts of law. Lawyers, your jobs are next!


Why didn’t you ask ChatGPT?


I asked ChatGPT and here is its answer:

Sure, I'd be happy to help. An artificial neural network (ANN) is a type of machine learning algorithm that is designed to mimic the way the human brain processes information. At its core, an ANN is composed of a large number of interconnected processing nodes, called neurons, which are organized into multiple layers. These layers are interconnected in a way that allows them to pass information back and forth, with each layer building on the information provided by the previous layer to make a decision or generate a response.

Transformers are a type of ANN that was introduced in 2017 by a team of researchers at Google. They are called "transformers" because they use a technique called "self-attention" to transform the input data before passing it through the network. This allows them to better capture the underlying relationships in the data, which makes them particularly well-suited for tasks like natural language processing and machine translation.

ChatGPT is a type of transformer that has been specifically designed for use in chatbots and other conversational AI systems. It is an extension of the GPT-3 model, which was developed by the same team of researchers at OpenAI. GPT-3 is a large language model that has been trained on a massive amount of text data and is able to generate human-like responses to a wide range of questions. By building on the capabilities of GPT-3, ChatGPT is able to provide even more sophisticated and natural-sounding responses in a conversational setting.

As for the size of the network, GPT-3 is a very large model, with 175 billion parameters. This means that it has a massive amount of interconnected neurons, which allows it to capture a wide range of knowledge and generate responses to a wide variety of questions. Of course, having a large network also requires a lot of computational power, which is why GPT-3 and other large language models are typically run on powerful servers or specialized hardware.


Now ask Ronald McDonald why his food is so healthy!


ChatGPT is a variant of the popular GPT-3 language model, specifically designed for chatbot applications. It uses a combination of deep learning and natural language processing techniques to generate human-like responses to text input in a conversation.

The way it works is by first pre-training the model on a large corpus of text data, which could include things like social media conversations, movie scripts, books, etc. This allows the model to learn the general structure and patterns of language.

Then, when given an input in the form of a question or statement, the model uses its pre-trained knowledge to generate a response. It does this by predicting the next word in the sentence, and then continuing to predict subsequent words until it reaches the end of the response.

Overall, the goal of ChatGPT is to enable chatbots to have more natural, human-like conversations with users.

(I asked ChatGPT to tell me how it works)


Here's an answer I wrote myself:

It gives people a heavily filtered and throttled interface to access to their language model.

---

Edit: I asked chat.openai.com about it:

Would "It gives people a heavily filtered and throttled interface to access to their language model." be a fair way to describe chat.openai.com?

Yes, that is an accurate description of chat.openai.com. The site provides users with a limited and controlled interface to access a language model, which is a type of artificial intelligence that is capable of processing and generating natural language. This allows users to have conversations with the language model and get responses based on the information it has been trained on.


AI generated comments should lead to bans. IMO, this sort of comment is no more valuable than a comment that just contains "lmao" or a dickbutt.


thats only half the truth, in which the AI sneakily omits the role of human reviewers that worked tirelessly to align its output and make it useful. Shame on the AI


Marginally funny, utterly useless and counterproductive is what it is (this particular use of chatGPT that is).


ChatGPT is really very simple. Imagine you could analyze a million books and identify all the words within them -- not the meanings of the words, just the actual letters they contain.

Now, when someone asks you about the history of France (or why the sky is blue), you could simply pluck out of your library the most common strings of word that seem to follow the words that were in your question!

It's like a kid in the 80's who thinks the answer to an essay question is to copy it from an encyclopedia, only the "encyclopedia" is very large and contains multiple sources.

So, the big take away needs to be that there is absolutely no understanding, no cognizance of any kind, no language comprehension going on. The answers look good because they contain all the same words as the most popular answers people have already written which the system scanned.

So ChatGPT turns out to great for parsing and summarizing documents, if that's something you need. But, since it doesn't know fact from fiction, it cannot apply logic or math, and it cannot perform reasoning or analysis, it's not good for finding out facts or discerning truth.

Another great failing of LLM software is that the user being spoken to is generic. The answers are not modeled for you, they're the same models for everyone. But a human teacher does their job by being exactly the opposite of this -- someone who is finely tuned to the needs and understandings of their audience. A good journalist or writer does the same.


> you could simply pluck out of your library the most common strings of word that seem to follow the words that were in your question!

This is not sufficient to explain how LLMs are able to synthesize novel, coherent poems or song lyrics. What you're describing seems closer to a markov model. So far I have yet to see a good explanatin of _why_ transformer models seem to have this emergent behavior as you scale it up. It's easy to see by construction that the stacked attention layers will work well to predict the masked token in "Capital of France is [MASK]", or how you can throw an additional head on top for sentiment analysis (which I recall was implemented by literally adding a dummy token at the start into which information about the entire sentence gets embedded). But it's not obvious (to me at least) that this would somehow generalize into being able to do the things chatGPT does.


I don't really see how it's different from, say, a large convolutional neural network learning progressively higher-order features as you progress through the layers of the network. At the lowest layers it's learning simple edge filters, which get combined into shapes, which get combined into filters that activate on faces, which get combined in ways that can be recognized as "family portrait", and so on. Of course, transformers have some unique advantages in terms of having very large context windows, being very parallelizable, etc.

When ChatGPT or any generative model produces output from a prompt, it's sampling from the (frozen) statistical structure it has learned. It makes sense that as you increase model capacity and the volume of training data, you can capture statistical patterns that occur at a very high level of abstraction. So, instead of just predicting the next token based on token-to-token patterns, it can predict the next token based on something resembling concept-to-concept patterns. (This somewhat demystifies the poetry generation / style transfer stuff that it can do. If you ask for a breach of contract complaint written as a sonnet, it can sample from both the patterns it's learned from legal documents and the patterns it's learned from poetry.)

What I wonder is how far this track of scaling up can take us. At the end of the day, interacting with ChatGPT is not that different from "interacting" with y=3x+7 by plugging in a value for x. It's just a much, much larger function.


Thanks, this is a nice explanation! Also found a paper related to emergence in LLMs: https://arxiv.org/pdf/2206.07682.pdf


Ah, but in fact it does. Code and poetry seem different to people but tokens are tokens to the computer, and it knows not one from the other.

The reason you get a poetry-style answer when you ask for one is that the mix of words and styles can be taken from different parts of the corpus. This is how you can have it write (bad) poetry about something for which only prose was scanned.


I'd propose that you come up with a prompt to test your hypothesis and try it out. I think seeing is believing here, I don't know how ChatGPT works but after experimenting with it I am sure it's not simply looking for runs of similar words to the prompt in it's training set.

Here's an example to consider: https://news.ycombinator.com/item?id=33874110


I think that the levels of Associativity between Linguistic Terms goes beyond a mere "mix of words and styles". When pulling on one part of the Prompt causes portions of the Output to shift or vanish, or change genre or word choice or composition or focal point or focal distance - - -

then, that's the very opposite of "mix" -That's "Stable Diffusion" in text2img, or the "Reversal of Entropy" in Physics terms..

I'm still blown away that computers have become inage "Imaginers", decades after Humans first started doing 3D through Computers. It's really all quite unbelievable.


I don't think this is an accurate depiction and I think there's a lot more going on there than you're saying there is. The models also involve Codex, which is able to effectively do pattern matching with more context you give it so that it can emit novel code. I've seen this happen numerous times with Copilot, where it works very well the more open documents you have, especially if those documents are related. True, it doesn't have a full semantic understanding of your codebase, but it's good enough to usually generate the right code once it's been "seeded" with something like an updated type definition that you then need to plumb through various parts of a codebase.


Yes, I use Copilot, too for that very reason. But we need to be very careful about words like "semantic" and "understanding" as the method is neither.

We'll get the best use out of these technologies if we don't ascribe to them magical qualities they don't have.

Poke around. You'll find out it just statistical math with tokens (letters and punctuation). No meaning is ascribed to anything.


I...did say that it doesn't have a full semantic understanding of your code. Copilot uses tree-sitter under the covers to be able to figure out several things that you can get by analyzing syntax alone, which is actually quite far. It lets you identify what kinds of declarations (e.g., a type) and what kind of expressions (e.g., pattern matching on that type) exist, provided there's a grammar for it, and then lets that influence the suggested code. This isn't perfect because indeed, you need a full language service to actually understand things, especially in resolving ambiguities in symbols (like shadowing in some languages), but, like I said...it's good enough.


Yes, Copilot has the advantage that programming language syntax is highly constrained and regularized, allowing it to slot-in your own variable names and functions into new code as you write it. This is the best feature of Copilot. It's not part of ChatGPT because that's not meant for writing code. And I wouldn't call it "understanding." But, it is useful.


> So, the big take away needs to be that there is absolutely no understanding, no cognizance of any kind, no language comprehension going on.

Okay, but what is understanding, cognizance, comprehension? And so on for further precision of definitions, until you reach that vague nebulous area that is consciousness. We simply haven't figured that out, period.

So we cannot really say that this AI does not have those qualities, unless its output is obviously showing it. Which ChatGPT does not, most of the time.


I wrote this[0] to better explain my wildly unpopular answer above. Everything is absolutely correct here, but it's not a popular view right now. Often very smart people like those on HN can be fooled by the "magic" of tech that might not really exist in the way they think it does.

[0] https://medium.com/gitconnected/behind-the-curtain-understan...


> Another great failing of LLM software is that the user being spoken to is generic. The answers are not modeled for you, they're the same models for everyone. But a human teacher does their job by being exactly the opposite of this -- someone who is finely tuned to the needs and understandings of their audience. A good journalist or writer does the same.

ChatGPT can do this if you just ask. E.g., explain as if I'm 5 and it will use simple words and phrases. Explain like I'm an expert and it will use technical jargon and expect a deeper shared knowledge with which it can draw from.

You shouldn't write so authoritatively when clearly you haven't explored its capabilities.


Alas, I have. You can ask the tool to parrot styles that it has already seen, but it cannot tailor the output to fill-in the gaps nor persuade someone (or a group) who is at a particular level of understanding. In human writing, these are the most common use cases.

The reason that I write in this manner is that I'd like to scrub out the notion that computers or software are magical. It isn't helpful to our line of business -- unless you're pitching VC. Most people don't know how stupidly simple computers really are, and unfortunately that includes many developers and tech folks.


But it literally can. I just tried. Go make up a group with X knowledge and tell chatgpt to explain something in terms only X group can understand. It will do it. Tell it to go in-depth and it will do it.

There are limitations to the model but making wrong statements about its capabilities does not help your point.

This is just one recent example. You may object that it's too common a question, but I've made many others and it responds competently:

> Please explain group theory in terms that children in elementary with rudimentary mathematics knowledge can understand. Go in-depth in your explanations

Sure, I'd be happy to explain group theory to children with elementary-level math knowledge. Group theory is a branch of mathematics that studies the properties of groups, which are sets of objects that can be combined together in certain ways.

One of the key ideas in group theory is that groups have a set of rules, called group axioms, that dictate how the objects in the group can be combined. For example, one group axiom is that when you combine two objects in the group, the result must also be an object in the group. This means that if you have a group of numbers, and you add two numbers together, the result must also be a number in the group.

Another important idea in group theory is the concept of symmetry. Symmetry is when an object looks the same, no matter how you rotate it or flip it. For example, a square has symmetry because it looks the same whether you turn it 45 degrees to the left or right, or flip it over. Groups can have symmetry too, and this can help us understand the patterns and regularities in the way that the objects in the group behave.

To understand group theory, it's helpful to think about examples of groups. One common example is the group of whole numbers, which are the numbers we use to count things. In this group, you can add two numbers together (like 3 + 5), and the result will be a whole number. You can also multiply two whole numbers together (like 3 x 5), and the result will be a whole number. This group has a symmetry property, because if you multiply two numbers together, the result will be the same no matter which order you put the numbers in (for example, 3 x 5 is the same as 5 x 3).

Another example of a group is the set of all possible moves in a game of chess. In chess, you can move pieces like the rook, the knight, and the queen in different ways on the board. These different moves form a group, because you can combine two moves (like moving the rook to one square and then moving the knight to another square) and the result will be a legal move in the game.

Group theory is a fascinating and important area of math, and it has applications in many different fields, from physics and chemistry to computer science and cryptography. While it can be challenging to understand, with a little bit of practice and perseverance, even young students can learn the basics of group theory and appreciate its beauty and power.


This is true but does not refute OP's point. There is an entire subreddit for ELI5 that is probably part of its training data. So if you ask for ELI5, that pattern IS part of the training corpus.


It is more complex than that, it isn't just scraping /r/ELI5 lol

Ask it a question about JavaScript and tell it to respond as a 1920s gangster and it will happily oblige and do a great job. And just to make sure we are on the same page, JavaScript was not available in the 20s


It is just scraping and reorganizing words while including words from your prompt in the equation. I already mentioned that the style (gangster) and vocabulary (JavaScript) can come from different scanned documents. Cool? Yes. Is it learning, or understanding? No.


If it's not learning then how can you tell it abstractly why it's code is wrong then have it fix it? Without telling it specifically "change this to this"


Within the limits of the prompt length that it can handle, it includes the prior history of the conversation as part of the prompt.

(Arguably, you could say it is “learning” but the learning is limited to the same conversation, with a very sharply limited depth.)


OP question mentions specific network types, while reply uses a vague "million-book library" metaphor, but at least mentions LLM? Also arbitrarily sorts human { teacher, journalist, writer } as "people who do custom work with each client" which makes a bit of sense for a Teacher in a small-class environment. But Writers and Journalists write at SCALE, and while each may speak with gusto to individuals just as well as they do their job, tailoring custom content to individuals is not their job.

Not to mention that individual-tailored model files are very handy with AI, share-able as those diles may be.

I call ChatGPT on you!!! And also do not know anything about your question, OP...


I wouldn't deign to have ChatGPT write my comments. They're all me!

Obviously, an author's audience can include more than one person. But every good piece of human writing is tailored for some audience. That was my point. You've heard the expression, "Know your audience?" ChatGPT cannot.


ChatGPT's "audience" can range from "hate speech for anti-bias training" to "Teenage Mutant Ninja Turtles Deliver Impassioned Speech to The World"

I would also like to use the No True Scotsman fallacy, and assert that no Human can ever really "know" or "write for" anybody's perfect preferences.


I wasn't asking for perfect preferences. I was asking for minimal cognizance, which the tool cannot provide. ChatGPT is more like a parrot than tech dreamers like to admit.


I'll grant you that ChatGPT is more Parrot than All-Seeing Oracle, but no Parrot (or even Google) will take nginx errors for input, and give me back chmod and chown commands to fix the permission issues.

AND! the commands it gave me were tailored to match my user and any directories/websites i told it I was working on ! !


Ask ChatGPT to write a song lyric for your favorite band in the style of your favorite poet, with a specific unusual topic.

Then claim that the model is “really very simple”


At the end of the day, all computing is very simple. What's impressing you here is the size of the dataset and the speed of the retrieval, brought about by advances in hardware. Also, we like the well-formed writing which is created by filtering and massaging the parroted text into a new set of sentences.

It is impressive, and people who don't know how it really works will think it's capable of all kinds of things it's not. Of course, this has been true forever. Just ask anyone who writes code about what non-devs think software can do.


Have you used ChatGPT?

Your understanding of GPTs and neural nets in general is consistently flawed.

You are describing a Markov model or at best an SVM model.

Would you make the same types of claims about Stable Diffusion, that it is “piecing together pieces of existing images?” That is not how these models work. Similar to the temporal memory that ChatGPT will appear to have during a conversation, “inpainting” with Stable Diffusion produces entirely novel output with context. Have you tried it out?


I have and I wrote extensively about it after seeing the unpopularity of my original comment. There's a link to my long Medium story about it in another comment.

Would I say that DALL-E, etc. also piece together existing images? Yes, of course, since that is the only way that technically it could work. All of computing is "input-process-output." There is no "create" step in any of computing.

The pieces that DALL-E pieces together are pixels of a certain color and brightness and placement relative to other pixels. Together, we perceive this as an image. If they weren't arranged that way, we'd just see noise or gray.

The same is true with ChatGPT. Unless they're put in a particular order, words just make word salad. It is the order that gives them their perceived meaning in a sentence. ChatGPT crafts "new" sentences by rearranging words from similar sentences it has already seen, just like DALL-E creates new paintings by rearranging pixels from paintings it has already seen.

A recent video from Noam Chomsky explains this from his perspectives on learning and language:

https://www.youtube.com/watch?v=PBdZi_JtV4c


> it cannot apply logic

Have you tried?


The tool can parrot back to you "logical" arguments which it has already scanned. It cannot figure out the logic behind them, nor seen when it's wrong, nor correct any errors, nor draw new logical conclusions.


Are you sure about that?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: