Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Are we in an AI Overhang? (lesswrong.com)
261 points by andyljones on July 27, 2020 | hide | past | favorite | 274 comments


I've been wondering about something: (I only know the basics of AI, this might be kinda incoherent)

Right now if you look at GTP-3's output it seems like it's approaching a convincing approximation of a bluffing college student writing a bad paper, correct sentences and stuff but very 'cocky'. It cannot tell right from wrong, and it will just make up convincing rubbish, 'hoping' to fool the reader. (I know I'm anthropomorphizing but bear with me).

Current models are being trained on a huge amount of internet text. As smarmy denizens of hackernews we know that people are very often wrong (or 'not even wrong') on the internet. It seems to me that anything trained on internet data is kinda doomed to poison itself on the high ratio of garbage floating around here?

We've seen with a lot of machine-learning stuff that biased data will create biased models, so you have to be really careful what you train it on. The dataset on which GTP-n has to be trained has to be pretty huge(?); and moderation is hard(?) and doesn't scale; it's easier to generate falsehood than truth; and the further we go along the more of internet data will be (weaponized?) output of GTP-(n-1); So won't the arrival of AGI just be sabotaged by the arrival of AGI?

Has anyone written something about the process of building AGI that deals with this?


I don't think the problem is really with the data set, it's that GTP-3 doesn't actually have any understanding of the data. It's building a model of the text input to generate text output. It's not building a model of what the data means, or what it represents.

When GTP-3 writes a scientific paper it's not trying to test a premise or critically evaluate some data, it's trying to generate text that looks like that sort of thing.

It doesn't matter how many excellent quality papers you fed it, or how stringently you excluded low quality input data, it would still only be trying to produce facsimiles. It wouldn't be actually trying to do the things a real conscientious scientist is trying to do when they write a paper. Arguably at best it might be trying to do what a deceitful scientist trying to get credit for a paper with spurious results with no actual scientific merit might be doing when writing a paper that looks plausible, but thats actually a completely different activity.

The code generation examples are really interesting. Here it's being used to generate real working code that has actual value, and it appears to work pretty well. It's code often has bugs, but who's doesn't? It only works for fairly short precisely definable coding tasks though. I don't think scaling it up to more complex coding problems is going to work. Again it doesn't understand the meaning of anything. It's trying to produce code that looks like working code, not actually solve the programming task you're giving it. It doesn't even know what a program is or what a programming task is. It doesn't know what input and output are. For example you can't ask it to modify existing code to change it's behaviour. It doesn't know code has behaviour. It has no idea what that even means and has no way to find out or any route to gaining that capability, because that's not a text transformation task and all it does is transform text.


I've seen this objection raised a lot, but I think it betrays a misunderstanding of what GPT-3 is capable of doing.

The best, in fact the only way to generate truly convincing text output on most subjects is to understand, on some level, what you're writing about. In other words, to create a higher level abstraction than simply "statistically speaking, this word seems to follow that one". Once you start to encode that words map to concepts, you can use the resulting conceptual model to create output which is conceptually consistent, then map it backwards to words. There is what humans do with sensory data, and there is good evidence that GPT-3 is doing this too, to some degree.

Take simple arithmetic, such as adding two and three digit numbers. GPT-2 could not do this very successfully. It did indeed look like it was treating it as a "find the textual pattern" problem.

But GPT-3 is much more successful, including at giving correct answers to arithmetic problems that weren't in its training set.

So what changed? We aren't sure, but the speculation is that in the process of training, GPT-3 found that the best strategy to correctly predicting the continuation of arithmetic expressions was to figure out the rules of basic arithmetic and encode them in some portion of its neural network, then apply them whenever the prompt suggested to do so.

If this is the case, and it remains speculation at this point, would you still argue that GPT-3 doesn't "understand" arithmetic, on some level? I would argue that this abstraction, this mapping of words onto higher-level concepts, which can then be manipulated to solve more complex problems, is exactly what intelligence is, once you strip away biologically-biased assumptions.

Certainly, at this point GPT-3's conceptual understanding remains somewhat primitive and unstable, but the fact that it exhibits it at all, and sometimes in spookily impressive ways, is what has people excited and worried. We have produced AIs that can perhaps think conceptually about relatively narrow topics like playing Go, but we have never before created one that can do so one such a wide range of topics. And there is no suggestion that GPT-3's level of ability represents a maximum. GPT-4 and beyond will be more powerful, meaning that it can mine more and more powerful conceptual understanding from their training data.


> We aren't sure, but the speculation is that in the process of training, GPT-3 found that the best strategy to correctly predicting the continuation of arithmetic expressions was to figure out the rules of basic arithmetic and encode them in some portion of its neural network

I don't mean to attack you personally, but this is a perfect example of what I feel is wrong with so much neural network research. (And I understand that you are just commenting in a discussion, not conducting research.)

In a word, it's baloney. And it's a really common pattern in neural networks' recent history: "How did they perform reasonably well on this task? We aren't sure, but the speculation is that they magically solved artificial general intelligence under the hood." Usually this is followed up by "I don't know how it works, but let's see if a bigger network can make even prettier text." Meanwhile, "it's funny how our image classifiers grossly misperform if you rotate the images a little or add some noise."

A rigorous scientific approach would be aimed at actually figuring out what these models can do, why, and how they work. Rather than just assuming the most optimistic possible explanation for what's happening -- that's antithetical to science.


>In a word, it's baloney.

This is where you lost me. They included important caveats to indicate not being sure, which is important to me as an indication of healthy skepticism. And you substituted a specific example: making inferences about arithmetic, for an more expansive, uncharitable, easy-to-caricature claim of "gee we must have solved general AI!" which is much easier to attack. And, unlike your counterpart, who hedged, you just went ahead and categorically declared it to be baloney, making you the only person to take a definitive side on an unsettled question before the data is in. This is a perfect example of the anti-scientific attitude exhibited in Overconfident Pessimism [0].

I don't think it's known how GPT-3 got so much better at answering math questions it wasn't trained on, I do think the explanation that it made inferences about arithmetic is reasonable, I think the commenter added all the qualifiers you could reasonably ask them to make before suggesting the idea, and frankly I would disagree that there's some sort of obvious history of parallels that GPT-3 can be compared to.

There is an interesting conversation to be had here, and there probably is much more to learn about why GPT-3 probably isn't quite as advanced as it may immediately appear to be to those who want to believe in it. But I think a huge wrench is thrown in that whole conversation with the total lack of humility required to confidently declare it 'baloney', which is the thing that sticks out to me as antithetical to science.

0: https://www.lesswrong.com/posts/gvdYK8sEFqHqHLRqN/overconfid...


Thanks for your reply. A couple responses to advance the conversation.

As a side note, it's worth mentioning that apparently, from other responses, it seems we have little idea how much arithmetic GPT-3 has learned, and it may not be much.

Anyway, I think the important distinction between my perspective and Overconfident Pessimism, which you attribute to me, is that I'm not talking about (im)possibility of achievement, I'm talking about scientific methodology or lack thereof.

In other words, I'm not saying (here) that some NLP achievements are impossible. I'm saying that we are not rigorously testing, measuring, and verifying what we are even achieving. Instead we throw out superficially impressive examples of results and invite, or provoke, speculation about how much achievement probably must have maybe happened somewhere in order to produce them.

We have seen several years of this pattern, so this is not a GPT-3 specific criticism; it's just that particular quote so neatly captured patterns of lack of scientific rigour that we have seen repeatedly at this point.

Probably the first example was image recognition. Everyone was amazed by how well neural nets could classify images. There was a ton of analogous speculation -- along the lines of 'we're not sure, but the speculation is the networks figured out what it really means to be a panda or a stop sign and encoded it in their weights.' The terms "near-human performance" and then "human-level performance" were thrown around a lot.

Then we found adversarial examples and realized that e.g. if you rotate the turtle image slightly, the model becomes extremely confident that it's a rifle. So, obviously it has no understand of what a turtle or a rifle is. And obviously, we as researchers don't understand what those neural nets were doing under the hood, and that speculation was extremely over-optimistic.

Engineering cool things can absolutely be a part of a scientific process. But we have seen countless repetitions of this pattern (especially since GANs): press releases and impressive-looking examples without rigorous evaluation of what the models are doing or how; invitations to speculate on the best-possible interpretation; and announcing that the next step is to make it bigger. I think this approach is both anti-science and misleading to readers.


> And it's a really common pattern in neural networks' recent history: "How did they perform reasonably well on this task? We aren't sure, but the speculation is that they magically solved artificial general intelligence under the hood." Usually this is followed up by "I don't know how it works, but let's see if a bigger network can make even prettier text."

Layperson here, but my impression is that "let's see if a bigger network can make even prettier text" has _worked_ far beyond the point most people expected it would stop working.

Also my layperson impression: most "researchers" that are on the cutting edge of cool things are more interested in seeing what cool things they can do than on doing rigorous science (which makes sense -- if you optimize for rigorous science, your stuff probably isn't as flashy as the stuff produced by people optimizing for flash).


> what cool things they can do than on doing rigorous science (which makes sense -- if you optimize for rigorous science, your stuff probably isn't as flashy as the stuff produced by people optimizing for flash.

Is this a new iteration on that zigzag quote?

> Zak phases of the bulk bands and the winding number associated with the bulk Hamiltonian, and verified it through four typical ribbon boundaries, i.e. zigzag, bearded zigzag, armchair, and bearded armchair.

From "The existence of topological edge states in honeycomb plasmonic lattices"

https://iopscience.iop.org/article/10.1088/1367-2630/18/10/1...


I don't know if this means something is truly wrong. AI is a mix of engineering and scientific research, just like most CS subfields. Recently, the emphasis has shifted towards engineering, as the applications of neural nets have skyrocketed after a few breakthroughs in performance.

It's similar to computer systems research. For example, a research paper on filesystems might tell us a simple trick which leads to better performance on NVMM. The paper may go into why the trick works, but it doesn't (and shouldn't need to) generalize and try to improve our general understanding of how to design filesystems on different hardware. We've been designing filesystems to this day and well, we are always still guessing about which approaches to use and hoping for the best. In the same vein, we don't even have a widely-accepted theory of how to use data structures yet.

So, I don't think that neural nets aren't scientific enough means that it's all BS. We have gaps in understanding, but the power of the models warrants a lot of continued work on finding useful applications.

Doesn't mean I don't think AI is over-hyped/overfunded though...


I agree with a lot of this, but I think there is a consistent pattern of AI announcements playing on humans' intuitions to create the impression that much more has been achieved than can actually be proven -- in fact, not even trying to prove anything. Part of this is that the researchers are humans too and may be misled themselves. But a rigorous research process would at least try to prevent that.

For example, people once thought playing chess was hard. So they thought that if a computer could beat the world champion, then computers would probably also be able to replace every job and so on. If you sent Deep Blue back in time to the 1960s, they wouldn't understand how it works so they'd probably assume that it since it could beat Petrosian in chess, it could probably drive cars and treat disease.

But then we built Deep Blue and realized that you don't need AGI to play chess; a very specialized algorithm will do it.

So we're like people in the 70s who've been handed Deep Blue. It's irresponsible, in my opinion, to over-hype it when we have no idea how it works.


Wait, you think AI is overfunded?


> Meanwhile, "it's funny how our image classifiers grossly misperform if you rotate the images a little or add some noise."

Same thing arguably happens with humans with rotation. Our eyes even rotate in the roll axis to keep gravity aligned things upright. Most people can draw faces more accurately when copying from an upside down face than a right side up one.


> So what changed? We aren't sure, but the speculation is that in the process of training, GPT-3 found that the best strategy to correctly predicting the continuation of arithmetic expressions was to figure out the rules of basic arithmetic and encode them in some portion of its neural network, then apply them whenever the prompt suggested to do so.

I saw a lot of basic arithmetic in the thousands range where it failed. If we have to keep scaling it quadratically for it to learn log n scale arithmetic then we're doing it wrong.

I'm surprised you think it learned some basic rules around arithmetic. A lot of simple rules extrapolate very well, into all number ranges. To me it seems like it's just making things up as it goes along. I'll grant you this though, it can make for a convincing illusion at times.


> To me it seems like it's just making things up as it goes along.

Oh, aren’t we all?


The example of simple arithmetic is interesting. I think you might be right, that is evidence that GPT-3 might be generating what we might consider models of it's input data. Very simple, primitive and fragile models, but yes that's a start. Thank you.


A disembodied AI with a really good model might be able to do good theoretical science, but it would still need a way of acting in the physical world to do experimental science.


With a sufficiently effective language model it would be fairly easy to bridge this model, by letting the text direct humans on the other side.

    Hypothesis: <AI writes this>
    Results: <human observations>
    <repeat>


That seems like a _very bad_ habit to get humans into.


Well, we already have tons of examples of computers telling humans what to do, e.g. autogenerated emails alerting a human to handle an issue.

The novel Manna explores where this can lead quite nicely - http://www.marshallbrain.com/manna1.htm


This strikes me as very similar to the debate around the Chinese Room.

https://plato.stanford.edu/entries/chinese-room/


I would love to talk to someone who actually believes in the Chinese Room argument. To me it seems to be ignoring the existence of emergent behavior, and the same argument could prove that a human Chinese speaker doesn't understand Chinese either: his neurons are just reacting to produce answers depending on the input and their current state (e.g. neurotransmitters and action potentials).


The Chinese Room argument is fairly transparently circular; if you assume understanding involves something more than applying a sufficiently complex set of deterministic rules, then a pure system of deterministic rules cannot ever achieve understanding.

Of course if you accept the required premise of the argument, you must accept that either, one, we don't live in a universe that is a pure system of deterministic rules, or, two, nothing in the universe can have true understanding.

The Chinese Room argument, scientific materialism, or the existence of true understanding—you can have at most two of those in a consistent view of the universe.


John Searle came up with that argument to conclude that despite a hypothetical Chinese room being able to have a conversation with someone, it doesn't truly have understanding, so N seems to be at least 1.

To your point though, the more interesting case is people who would disavow the Chinese Room argument, but then end up using reflecting its views while argue against the intelligence of this or that system.


Practically everyone in my online bubble feels similarly, it seems, though I do think steelmanning it is a great way to explore the topic. Same with the Mary's Room argument.

https://plato.stanford.edu/entries/qualia-knowledge/


Peter Watts explores this in the novel Blindsight. I don't want to give the plot away, but the main idea is really interesting, and relevant to this discussion.


I'm posting to second the recommendation for the novel. It is the most interesting exploration of the Mind's I (not a typo) that I've come across in modern sci-fi.

It can be read in its entirety at the author's site: https://rifters.com/real/Blindsight.htm


Threads like these are the reason why I keep coming back to HN !


Recently finished my second read of Blindsight. Enjoyed it more the second time than the first.


> We aren't sure, but the speculation is that in the process of training, GPT-3 found that the best strategy to correctly predicting the continuation of arithmetic expressions was to figure out the rules of basic arithmetic and encode them in some portion of its neural network, then apply them whenever the prompt suggested to do so.

I strongly disagree. GPT-3 has 100% accuracy on 2-digit addition, 80% on 3-digit addition, 25% on 4-digit addition and 9% on 5-digit addition. If it could indeed "understand arithmetic" the increase in number of digits should not affect its accuracy.

My perspective as an ML practitioner is that the cool part of GPT-3 is storing information effectively and it is able to decode queries easier than before to get the information that is required. Yet with things like arithmetic, the most efficient way would be to understand the rules of addition but the internal structure is too rigid to encode those rules atm.


I don't think training on language by itself is enough. Consider for example, if we found extraterrestrial transmissions from an alien civilization. We don't know what they look like, what they're made of or if they even have corporeal form. All we have is a large quantity of sequential tokens from their communications.

It's possible to train GPT3 to produce a facsimile of these transmissions, but doing so does not let us learn anything at all about these aliens, beyond statistical correlations like ⊑⏃⟒⍀ often occurring in close proximity to ⋏⟒⍙⌇ (what do they represent - who knows?). Just having the text is not enough, because we have no understanding of the underlying processes that produced the text.

That said, this is only a limitation of language models as they currently exist. I imagine it would be possible to train a ML model that encodes more of the human experience via video/audio/proprioception data.


I wouldn't be so sure we couldn't decode the meaning of an alien language given enough sample text. There have been some advances[1] towards learning a translation between two human languages in an unsupervised manner, meaning without any (language1, language2) sentence pairs to serve as the ground truth for building a translation. Essentially it independently learns abstract representations of the two languages from written text in each language, all the while nudging these abstract representations towards identical feature spaces. The result is a strong translation model trained without utilizing any upfront translations as training data.

The intuition behind this idea is that the structure inherent in a language is dependent upon features of the world being described by that language to some degree. If we can abstract out the details of the language and get at the underlying structure the language is describing, then this latent structure should be language-independent. But then translation turns out to simply be a matter of decoding and encoding a language to this latent structure. One limitation of this idea is that it depends on there being some shared structure that underlies the languages we're attempting to model and translate. It's easy to imagine this constraint holds in the real world as human contexts are very similar regardless of language spoken. The basic units and concepts that feature in our lives are more-or-less universally shared and so this shared structure provides a meaningful pathway to translation. We might even expect the world of intelligent aliens to share enough latent structure from which to build a translation given enough source text. The laws of physics and mathematics are universal after all.

[1] https://openreview.net/pdf?id=rkYTTf-AZ


This doesn't really make any sense to me. Shared structure is not enough to assume shared meaning. Even given the idea of Universal Grammar (which seems extremely likely, given the interchangeablility of human languages for babies), that tells us nothing about the actual words and their association with the human world.

Take the sentence 'I fooed a bar with a Baz' - can you infer what I did from this?


>Shared structure is not enough to assume shared meaning.

How do you define meaning? If we can find a mapping between the sequence of words in a language and the underlying structure of the world, then we by definition know what those words mean. The question then reduces to whether there will be multiple such plausible mappings once we have completely captured the regularity of a "very large" sequence of natural language text. I strongly suspect the answer is no, there will only be one or a roughly equivalent class of mappings such that we can be confident in the discovered associations between words and concepts.

The number of relationships (think graph edge) between things-in-the-world is "very large". The set of possible relationships between entities is exponential in the size of the number of entities. But the structure in a natural language isn't arbitrary, it maps to these real world relationships in natural ways. So once we capture all the statistical regularities, there should be some "innocent" mapping between these regularities and things-in-the-world. "Innocent" here meaning a relatively insignificant amount of computation went into finding the mapping (relative to the sample space of the input/output).

>Take the sentence 'I fooed a bar with a Baz' - can you infer what I did from this?

Write me a billion pages of text while using foo bar baz and all other words consistently throughout, and I could probably tell you.


You're relying on having texts covering all aspects of a language.

Here's a good example. Suppose I had a huge set of recipe books from a human culture - just recipes, no other record of a culture.

I might be able to get as far as XYZZY meaning "a food that can be sliced, mashed, fried and diced" but how would I really tell if XYZZY means carrot, potato, or tomato, or tuna?


> If we can find a mapping between the sequence of words in a language and the underlying structure of the world, then we by definition know what those words mean.

This seems a bit tautological to me: if being able to make certain mappings is understanding, then does this not amount to "once we understand something, understanding it is a solved problem?"

On the other hand, the apparently simplistic mappings used by these language models have achieved way more than I would have thought, so I am somewhat primed to accept that understanding turns out to be no more mysterious than qualia.

I doubt that just any mapping will do. One aspect of human understanding that still seems to be difficult for these models is reasoning about causality and motives.

I think it is a fairly common intuition that one cannot understand something just by rote-learning a bunch of facts.


>This seems a bit tautological to me

I meant it in the sense of: given any reasonable definition of "understanding", finding a mapping between the sequences of words and the structure of the world must satisfy the definition.

>I think it is a fairly common intuition that one cannot understand something just by rote-learning a bunch of facts.

I agree, but its important to understand why this is. The issue is that learning an assignment between some words and some objects misses the underlying structure that is critical to understanding. For example, one can point out the names of birds but know nothing about them. It is once you can also point out details about their anatomy, how they interact with their environment, find food, etc and then do some basic reasoning using these bird facts that we might say you understand a lot about birds.

The assumption underlying the power of these language models is that a large enough text corpus will contain all these bird facts, perhaps indirectly through being deployed in conversation. If it can learn all these details, deploy them correctly within context, and even do rudimentary reasoning using such facts (there are examples of GPT-3 doing this), then it is reasonable to say that the language model captures understanding to some degree.


Ok, I think I'm getting some of your idea more clearly. Essentially, the observation is that we probably can't consistently replace the words in a novel with other words without preserving the meaning of the novel (consistently meaning each word is always replaced with the same other word).

I think the biggest problem with this argument is the assumption that '[the structure in a natural language] maps to these real world relationships in natural ways'. One thing we know for sure is that human language maps to internal concepts of the human mind, and that it doesn't map directly to the real world at all. This is not necessarily a barrier to translation between human languages, but I think it makes the applicability of this idea to translations between human and alien languages almost certainly null.

Perhaps the most obvious aspect of this is any word related directly to the internal world - emotions, perceptions (colors, tastes, textures etc) - there is no hope of translating these between organisms with different biologies.

However, essentially any human word, at least outside the sciences, falls in this category. At the most basic level, what you perceive as an object is a somewhat arbitrary modeling of the world specific to our biology and our size and time scale. To a being that perceived time much slower than us, many things that we see as static and solid may appear as more liquid and blurry. A significantly smaller or larger creature may see or miss many details of the human world and thus be unable to comprehend some of our concepts.

Another obstacle is that many objects are defined exclusively in terms of their uses in human culture and customs - there is no way to tell the difference between a sword, a scalpel, a knife, a machete etc unless you have an understanding of many particulars of some specific human society. Even the concept of 'cutting object' is dependent on some human-specific perceptions - for example, we perceive a knife as cutting bread, but we don't perceive a spoon as cutting the water when we take a spoonful from a soup, though it is also an object with a thin metal edge separating a mass of one substance into two separate masses (coincidentally, also doing so for consumption).

And finally, even the way we conceive mathematics may be strongly related to our biology (given that virtually all human beings are capable of learning at least arithmetic, and not a single animal is able to learn even counting), possibly also related to the structure of our language. Perhaps an alien mind has come up with a completely different approach to mathematics that we can't even fathom (though there would certainly be an isomorphism between their formulation of maths and ours, neither of our species may be capable of finding it).

And finally, there are simply so many words and concepts that are related to specific organisms in our natural environment, that you simply can't translate without some amount of firsthand experience. I could talk about the texture of silk for a long while, and you may be able to understand roughly what I'm describing, but you certainly won't be able to understand exactly what a silkworm is unless you've perceived one directly in some way that is specific to your species, even though you probably could understand I'm talking about some kind of other life form, it's rough size and some other details.


>human language maps to internal concepts of the human mind, and that it doesn't map directly to the real world at all.

I disagree. Mental concepts have a high degree of correlation with the real world, otherwise we could not explain how we are so capable of navigating and manipulating the world to the degree that we do. So something that correlates with mental concepts necessarily correlates with things-in-the-world. Even things like emotions have real world function. Fear, for example, correlates with states in the world such that some alien species would be expected to have a corresponding concept.

>There is no way to tell the difference between a sword, a scalpel, a knife, a machete

There is some ambiguity here, but not as much as you claim. Machetes, for example, are mostly used in the context of "hacking", either vegetation or people, rather than precision cuts of a knife or a scalpel. These subtle differences in contextual usage would be picked up by a strong language model and a sufficient text corpus.


> Mental concepts have a high degree of correlation with the real world, otherwise we could not explain how we are so capable of navigating and manipulating the world to the degree that we do.

This is obviously a strong association from human mental concepts to real world objects. The question is if the opposite mapping exists as well - there could well be infinitely many non-human concepts that could map onto the physical world. They could have some level of similarity, but nertheless remain significantly different.

For a trivial example, in all likelihood an alien race that has some kind of eye would perceive different colors than we do. With enough text and shared context, we may be able to understand that worble is some kind of shade of red or green, but never understand exactly how they perceive it (just as they may understand that red is some shade or worble or murble, but never exactly). Even worse, they could have some colors like purple, which only exists in the human mind/eye (it is the perception we get when we see both high-wavlength and low- wavelength light at the same time, but with different phases).

Similarly, alien beings may have a significantly different model of the real world, perhaps one not divided into objects, but, say, currents, where they perceive moving things not as an object that changes location, but as a sort of four-dimensional flow from chair-here to chair-there, just like we perceive a river as a single object, not as water particles moving on a particular path. Thus, it may be extremely difficult if not impossible to map between our concepts and the real world back to their concepts.

> Fear, for example, correlates with states in the world such that some alien species would be expected to have a corresponding concept.

Unlikely, given that most organisms on earth have no semblance of fear. Even for more universal mental states, there is no reason to imagine completely different organisms would have similar coping mechanisms as we have evolved.

> Machetes, for example, are mostly used in the context of "hacking", either vegetation or people, rather than precision cuts of a knife or a scalpel.

Well, I would say hacking VS cutting are not significantly different concepts, they are a matter of human-specific and even culturally-specific degrees, which would be unlikely to me to be uniquely identifiable, though some vague level of understanding could probably be reached.


This is as true of any information channel, including your eyes and ears.


That kind of gets into "what is it like to be a bat" territory.

The more imminent question is more of engineering than philosophy - what does it take for GPT-3 to not make the mistakes it does? This would require it to have some internal model for why humans generate text (persuasion, entertainment, etc.) as well as the social context in which that human generated the text. On a lower level it also needs to know about cognitive shortcuts that humans take for granted (object permanence, gravity)

Basically, some degree of human subjective experience must be encoded and fed to the model. That's a difficult problem, but not an intractable one.


We don't even have to look to hypothetical aliens for an example. All the bronze-age Aegean scripts, except for Linear B, remain undeciphered.


I certainly agree that GPT-3 appears to be learning how to do mathematics. I suspect that if you gave it enough it might perhaps even learn the maths of physics.

I suspect that if it did that, it would be able to write a very convincing fake paper about how it designed and tested an Alcubierre drive, and that the main clue about the paper being fake being a sentence such as “we dismantled Jupiter for use as a radiation shield against the issue raised by McMonigal et al, 2012”.

Or, to put it another way, the hardest of hard SciFi, but still SciFi, not science.


Nothing you say convinces me that GPT-3 is exhibiting any conceptual understanding.

Imitating existing texts better is not conceptual understanding.

"Understanding" means you can explain why you made a decision. It means there exists a model with conceptual entities that you can access and make available to others.

What GPT-3 does is this: "I am given many answers to similar questions, and I build up a huge model that reflects these answers. If I'm given a new question, I come up with a response that's probably right, based on the previous answers, but there's no explanation possible."

Don't get me wrong - it's amazing! But it's not understanding anything yet.

Even humans have skills that we know but do not understand - like "walking" for most of us!

But on abstract question, we almost always have access to a complete set of reasons. "Why did you go back to the store?" "I left my bag there." "Why did you talk to that man?" "I know he's the manager, I'm a regular." "Why were you happy?" "I had my bag."

(Indeed, this is so common that people often "backdate" reasons for actions that didn't really have any reason at the time. But I digress.)


I wonder how well it would perform in accuracy if given a large number of simple but lengthy sums like 13453 + 53521. Increased set size would move it beyond simple input/output memorization. Although if it recurses properly and carries the digit it could be text parsing and have an accurate but probably very inefficently written math parser.


> Although if it recurses properly and carries the digit it could be text parsing and have an accurate but probably very inefficently written math parser.

I suspect this is how many humans do arithmetic (especially considering how many people conflate numbers with their representation as digits). So if GPT-3 is doing that, that's pretty impressive.


You don't have to wonder. In their paper: https://arxiv.org/abs/2005.14165 they state it has 0.7% accuracy on zero shot 5 digit addition problems and 9.3% accuracy on few shot 5 digit addition problems.


By the way: Arithmetic accuracy is better if dollar sign and commas are added (financial data in the training set):

http://gptprompts.wikidot.com/logic:math


You do have to wonder, because as that section states, the BPEs may impede arithmetic, and as we've found using the API, if you use commas, the accuracy (zero and few-shot) goes way up.


Solmonoff induction would imply the algorithm that learns the rules of arithmetic will have the most concise model for the data. But, it is unclear these gpt-3 type algorithms are solomonoff learners.


>> But GPT-3 is much more successful, including at giving correct answers to arithmetic problems that weren't in its training set.

That's not exactly what the GPT-3 paper [1] claims. The paper claims that a search of the training dataset for instances of, very specifically, three-digit addition, returned no matches. That doesn't mean there weren't any instances, it only means the search didn't find any. It also doesn't say anything about the existence of instances of other arithmetic operations in GPT-3's training set (and the absence of "spot checks" for such instances of other operations suggests they were, actually, found- but not reported, in time-honoured fashion of not reporting negative results). So at best we can conclude that GPT-3 gave correct answers to three-digit addition problems that weren't in its training set and then again, only the 2000 or so problems that were specifically searched for.

In general, the paper tested GPT-3's arithmetic abilities with addition and subtraction between one to five digit numbers and multiplication between two-digit numbers. They also tested a composite task of one-digit expressions, e.g. "6+(4*8)" etc. No division was attempted at all (or no results were reported).

Of the attempted tasks, all than addition and subtraction between one to three digit numbers had accuracy below 20%.

In other words, the only tasks that were at all successful were exactly those tasks that were the most likely to be found in a corpus of text, rather than a corpus of arithmetic expressions. The results indicate that GPT-3 cannot "perform arithmetic" despite the paper's claims to the contrary. They are precisely the results one should expect to see if GPT-3 was simply memorising examples of arithmetic in its training corpus.

>> So what changed? We aren't sure, but the speculation is that in the process of training, GPT-3 found that the best strategy to correctly predicting the continuation of arithmetic expressions was to figure out the rules of basic arithmetic and encode them in some portion of its neural network, then apply them whenever the prompt suggested to do so.

There is no reason why a language model should be able to "figure out the rules of basic arithmetic" so this "speculation" is tantamount to invoking magick.

Additionally, language models and neural networks in general are not capable of representing the rules of arithmetic because they are incapable of representing recursion and universally quantified variables, both of which are necessary to express the rules of arithmetic.

In any case, if GPT-3 had "figure(d) out the rules of basic arithmetic", why stop at addition, subtraction and multiplication between one to five digit numbers? Why was it not able to use those learned rules to perform the same operations with more digits? Why was it not capable of performing division (i.e. the opposite of multiplication)? A very simple asnwer is: GPT-3 did not learn the rules of arithmetic.

_________

[1] https://arxiv.org/abs/2005.14165


I dunno to me it seems clear that there is nothing of what we call intelligence in these neural networks. And I think we could have a general AI that can problem solve in the world but have zero of what we know of as understanding and sel awareness


Another way to think about it is comparing to how children learn. First, children spend inordinate amount of time just trying to make sense of words they hear. Once they develop their language models, adults can explain new concepts to them using the language. What'd be really exciting is being able to explain a new concept to GPT-n in words, and have it draw conclusions from it. Few-shots learning is a tiny step in that direction.


Children don't spend inordinate amounts of time learning words. In fact, past the first months, children often learn words from hearing them a single time.


I have a 4, 6, and 8 year old, and each of them are still learning words. Yeah they don’t spend 80% of each day learning words, but building up their vocabulary legit takes a looong time.


Oh, absolutely. I'm 31 and I'm still learning words!

But I don't think I've ever spent time to learn a particular word - it's almost always enough to hear it in context once, and maybe get a chance to actually use it yourself once or twice, and you'll probably remember it for life.

If it's a word for a more complex concept (e.g. some mathematical construct), you may well need more time to actually understand the meaning, and you may also pretty easily forget the meaning in time, but you'll likely not forget the word itself.


"But I don't think I've ever spent time to learn a particular word - it's almost always enough to hear it in context once, and maybe get a chance to actually use it yourself once or twice, and you'll probably remember it for life."

I'd strongly bet against this. If it were true, SAT and similar vocabulary tests would be trivial to anybody who has taken high school English, and I think it is not the case that most people perceive the SAT to be trivial.


That's of course correct. Perhaps GPT-3 can do that too? I don't have access to it, but I wonder if it can be taught new words using few-shot learning.

In fact, even GPT-2 gets close to that. Here's what I just got on Huggingface's Write With Transformer: Prompt: "Word dfjgasdjf means happiness. What is dfjgasdjf?" GPT-2: "dfjgasdjf is a very special word that you can use to express happiness, love or joy."

What takes time is all the learning a child needs to go through before they can be taught new words on the spot.


How can we tell whether or not GPT-3 has understanding of the data?


I think the best way to tell whether GPT-3 has understanding of its data is by by asking questions related to the data but not explicit in the training dataset.


Maybe by asking it to clean up it's own training data?


By asking counterfactual questions.


I think this is a very good take.


I've been wondering about something similar to you, but I read one of Pearl's causality books recently and thought that might be the missing piece.

It's certainly impressive what GPT-3 can do, but it boggles the mind how much data went into it. By contrast a well-educated renaissance man might have read a book every month or so from age 15 to 30? That doesn't seem to be anywhere near what GPT could swallow in a few seconds.

When you look at how GPT answers things, it kinda feels like someone who has heard the keywords and can spout some things that at least obscure whether it has ever studied a given subject, and this is impressive. What I wonder is whether it can do reasoning of the causality kind: what if X hadn't happened, what evidence do we need to collect to know if theory Z is falsified, which data W is confounding?

To me it seems that sort of thing is what smart people are able to work out, with a lot of reading, but not quite the mountain that GPT reads.


> By contrast a well-educated renaissance man might have read a book every month or so from age 15 to 30? That doesn't seem to be anywhere near what GPT could swallow in a few seconds.

You're ignoring the insane amount of sensory information a human gets in 30 years. I think that absolutely dwarfs the amount of information that GPT-3 eats in a training run.


But that sensory information includes very few written words. GPT(n) isn't being trained on "worldly audio data", or "worldly tactile data", or in fact any sensory data at all.

So the two training sets are completely orthogonal, and the well educated renaissance man is somehow able to take a very small exposure to written words and do at least as well as GPT(n) in processing them and responding.


And the renaissance man has tons of structure encoded in his brain on birth already. Just like GPT-3 does before you give it a prompt. I'm not saying this is fully equivalent (clearly a baby can't spout correct Latex just by seeing three samples), but you simply cannot just handwave away thousands of years of human evolution and millions of years of general evolution before that.

The renaissance man is very obviously not working solely based on a few years of reading books (or learning to speak/write).


A person who is never taught to read will never be able to respond to written text. So the renaissance-era man is working "solely" based on their lived experience with text, which compared to GPT(n) is tiny.

Ah! you cry. Don't humans have some sort of hard-wiring for speech and language? Perhaps. But it is clearly completely incapable of enabling an untrained human to deal with written text. Does it give the human a head start in learning to deal with written text? Perhaps (maybe even probably). It demonstrably takes much less training than GPT(n) does.

But that is sort of the point of the comment at the top of this chain.


Did this renaissance man teach themselves to read from scratch or are we assuming they were assisted in their schooling?


Doesn't make too much difference to the overall point.


That's an interesting point. I'm not sure how to measure that though. Also my guess is we have the sensors on for part of the day only, plus that's filtered heavily by your attention process, eg you can't read two books at once.


Yeah. A dog never read a book and only has rudimentary understanding of language (if any - no idea, maybe they just pattern match cause and effect) but a dog-level AI would be incredibly valuable. And you can train a dog to be fairly competent in a task in less than a year.


Do people generally learn what to say from sensory data? How would the sensory data impact our ability to produce meaningful information?


Written language is used, in large part, to express sensory data (ex: colors, shapes, events, sounds, temperatures, etc). Abstract models are, through inductive reasoning, extrapolated from that sensory information. So in effect more sensory data should mean more accurate abstract models.

For example, it might take several paragraphs to wholly capture all the meaningful information in one image in such a way that it can be reproduced accurately. Humans, and many animals, process large amounts of data before they are even capable of speech.

The data GPT-3 was provided with pales in comparison. It is unclear whether these GPT models are capable of induction because it may be that they need more or better sanitised data to develop abstract models. Therefore they should be scaled up further until they only negligepbly improve. If even then they, still, are incapable of general induction or have inaccurate models. Then the transformer model is not enough or perhaps we need a more diverse set of data (images, audio, thermosensors, etc).



The well-educated renaissance man has a lot more feedback on what is useful and what isn't. I think that GPT-3 could be vastly improved simply by assigning weights to knowledge, IE valuing academic papers more.

Humans get this through experience and time (recognizing patterns about which sources to trust) but there is nothing magical about it. Should be very easy to add this.


I had this exact conversation with a friend over the weekend. If GPT-n weights all input equally then we are truly in for a bad ride. It's basically the same problem we are experiencing with social media.


It is a very interesting problem. Throughout history humans have been able to rely on direct experience via our senses to evaluate input and ideas.

Many of those ideas are now complex enough that direct experience doesn't work. IE global warming, economics, various policies. Futhermore even direct (or near-direct experience such as video) is becoming less trustworthy due to technology like deepfakes and eventually VR and neuralink.

It seems to me that this problem of validating what is real and true might soon be an issue for both humans and AI. Are we both destined to put our future in the hands of weights provided by 'experts'?


A well-educated renaissance man is not a blank slate, he is the result of millions of years of selection.


The human equivalent to GPT-3 training is not as much the learning one has in a lifetime, but the millions of years evolution process. "Normal" human learning is more akin to finetunning I think, although these analogies are flawed anyway.


Obviously the genome encodes what you need to grow a brain, but, there aren't enough bits in the human genome to encode many neural network weights.


Good point. By this logic, evolution is more about creating the best architecture, which I think is also a good analogy. But I also think that a lot of the brain's "weights" are pretty much set before any experience (maybe less so in humans, but several animals are born pretty much ready, even ones with complex brains, like whales), the genome information is, in a sense, very compressed, so even if isn't setting individual weights, it does determine the weights somehow, I think.

Does anyone know if anyone has investigated these questions more seriously elsewhere?


> What I wonder is whether it can do reasoning of the causality kind: what if X hadn't happened, what evidence do we need to collect to know if theory Z is falsified, which data W is confounding?

I think logic is the easy part, we can already do that with our current technology.

The difficult part is disambiguating an input text, and transforming it into something a logic subsystem can deal with. (And then optionally transform the results back to text).


> When you look at how GPT answers things, it kinda feels like someone who has heard the keywords and can spout some things that at least obscure whether it has ever studied a given subject, and this is impressive.

Not unlike a non-technical manager discussing tech!


> it will just make up convincing rubbish

It's not certain that this is always the case. In at least one case I've seen, if you give it question-and-answer prompts where you don't demonstrate that you will accept the answer "your question is nonsense", it will indeed make things up; but if you include "your question is nonsense" as an acceptable answer in the sample prompts, then it will use it correctly. See https://twitter.com/nicklovescode/status/1284050958977130497 .

It seems that we have a lot to learn about how to use GPT-3 effectively!


> It cannot tell right from wrong

It doesn't know which facts about the world are true and which are fabricated except through the text it's trained on, but to a large extent neither do you. It suffices to merely have the ability to reason about it. The primary difference is that whereas you reason fairly competently from one perspective with one pseudo-coherent set of goals, GPT-3 reasons to a weak degree from all perspectives, privileging no view point in particular.

How important this ends up being depends on where the model plateaus. On one extreme, if it plateaus close to where GPT-3 already is, no harm done, it's a fun toy. On the other, if it scales until perplexity gets to far-superhuman levels, it doesn't matter at all, since you can just prompt it with Terence Tao talking about his latest discovery.

Naturally, it will land somewhere in the middle of these points. The question is ultimately then whether the landing point captures enough general reasoning that you can use it to bootstrap some more advanced reasoning agent. A sufficiently powerful GPT-N should, for example, be able to deliberate over its own generated ideas and sort the coherent reasoning from the incoherent.


It seems to me that anything trained on internet data is kinda doomed to poison itself on the high ratio of garbage floating around here?

That same sentiment could be equally applied to humans, and not just in the internet era, but throughout all of history. There will always be misinformation and "wrong" opinions out there. "It cannot tell right from wrong" is an accusation leveled against human beings every day. We can't even all agree on what is right and wrong, truth or untruth.

A true AI is going to have to wade through all that and make its own decisions to be viewed and judged from many different perspectives, just like the rest of us.


I've also been wondering this. It is like Kessler Syndrome. We have to be careful not to pollute our ecosystem of data.

https://en.wikipedia.org/wiki/Kessler_syndrome

Speaking with GPT-3 also makes one realize how influenced its predictions of AI scenarios are by dystopian memes. In any conversation in which you are "speaking to the AI", the AI can go rogue.

There just aren't enough positive role models authors have written for AGI.


> It seems to me that anything trained on internet data is kinda doomed to poison itself on the high ratio of garbage floating around here?

Ah one only needs to think back to Tay to know how these sort of things will end.

https://en.wikipedia.org/wiki/Tay_(bot)

(Imagine 4chan got a wind of this bot and retrained it, which is probably what happened...)


> It would be a bit like how carbon dating or production of low background steel changed after 1945 due to nuclear testing

https://news.ycombinator.com/item?id=23896293

I guess books published will be more useful than reddit rants (depending on the application)


> anything trained on internet data is kinda doomed to poison itself on the high ratio of garbage floating around here?

Low-quality noise cancels out and leaves the high-quality signal. In the limit, the internet offers the true sequence probabilities for compression of natural text.

You can also put more weight on authoritative data sources, such as Wikipedia and StackOverflow, but even uniformly weighted: It is possible to sequence-complete prime numbers, despite the many many pages online with random numbers.

GPT-3 is trained on a filtered version of Common Crawl, enhanced with authoritative datasets, such as Books1, WebText, and Wikipedia-en. Moderation is done automatically, with a toxicity classifier/toggle. If GPT-n becomes good enough to be accepted in authoritative datasets, then it is perfectly fine training data, a form of semi-supervised learning.

Bias is going to be a double-edged sword: I believe it will be impossible to prescribe common sense, nor to sanitize common sense to remove, say, gender bias, and still be able to understand a sexist joke about female programmers, or male nurses. We want an AI to be human, but we don't want it to associate CEOs with white males, dark hair, wearing suits. That will conflict.


'authoritative data sources, such as Wikipedia"

Lol


> Canberra is the capital city of Australia.


I largely agree with the arguments made, but the following assertion is plain bogus

> GPT-3 is the first NLP system that has obvious, immediate, substantial economic value.

Text mining (relation extraction, named entity recognition, terminology mining) and sentiment analysis are billion dollar industries and are being directly applied right now in marketing, finance, law, search, automotive, basically every industry. Machine translation is another huge industry of its own. Chat bots were all the hype a few years ago. Let's not reduce the whole field of NLP to language generation.


> are billion dollar industries

When speaking of billion dollar investments, a billion dollar industry is not substantial. Google and Facebook's industries are advertising, at $600bn/year. Amazon's industry is retail, at $25tn/year.

What's opened up by the GPT-3 and its prompt-programming abilities is services, without qualification. That's $50tn/year, and capturing some tiny percentage of it is what's needed to make a billion-dollar investment worthwhile.

That said, I admit this isn't the mindset most people take when they read 'substantial'.

e: I changed the wording from 'substantial' to 'transformative', thanks!


GPT-3 lets you create rigged demos to do lots of tasks but so far it's not reliable enough to do anything in production. It seems unlikely to get there using output based on random word selection. Nobody is even talking about error rates yet.

The best applications are probably when error rates don't matter because a human is just going to use it for inspiration.


I might have missed the business plan behind monetising GPT3 here. Can you elaborate on why you think prompt programming will successfully take a cut from services?

Prompt-programming is a standard features of all LMs. What differentiates GPT3 is not this application but the quality of the output. NLP companies such as chatbot providers and specialised search (patents, legal assistants, tenders) have been using domain-specific LMs for years.


You both agree I think. He's not saying that GPT-3 invented the revolutionary ability of prompt programming, but that prompt programming allows GPT-3 to be applied to arbitrary contexts (from programming to providing legal advice to generating fiction). That amazing generality and high quality allow it to be applicable to most services.

So it's taking some slice of the $50tn pie.


Yeah this is a bizarre statement considering that GPT-3 is definitely not any of those things really. GPT-3 is far, far too computationally expensive to have value in industry. A linear CRF is more useful than most NN approaches in industry right now, just simply because in many circumstances you want to have something that you can apply to a few billion documents and get the result within a few hours, then tweak a few things and repeat if you like. These simple models also have the ability to be predictable as well. Some transformer or lstm methods can be useful in industry, but it really depends on the application. I certainly would not be using GPT-like systems for much in industry, other than gimmicks for marketing. GPT-3 is useful for academia - not industry.


Hm? GPT-3 is relatively cheap to inference from, at least compared to the cost of training. You can load all the params onto a single TPU, actually. (A TPU can allocate up to 300GB on its CPU without OOM'ing.)

AI dungeon is also powered by GPT-3, and it's quite snappy. I'm not sure why GPT-3 is seen as computationally expensive, but it seems workable.


Only the premium-exclusive version of the AI model, named Dragon released last month.


Premium here means ten dollar a month, I think.


GPT-3 is not that expensive. Estimating from the paper, to train the model, the GPU hardware costs were a few million dollars, and the electricity costs were probably under 100k. This is totally feasible for many companies today, especially if the hardware is a fixed cost and can be reused for training multiple models.

And as mentioned elsewhere, inference for a trained model is much, much cheaper.


Is sentiment analysis really that good already?

Every time I’ve looked at the start of the art in sentiment analysis, it seems to be suffering from the same issue that bag-of-words has with modifiers like “not”. Or is that more a theoretical problem than a practical one?

I appreciate this is a rapidly moving field, so my knowledge could easily be out of date.


Are modifiers an actual issue for many applications?

"This isn't a terrible horrible restaurant that nobody should ever go to" seems like 1) it doesn't mean it's actually a good restaurant either 2) the writer might be joking and sarcastic and 3) this will be very rare in actual reviews.

Put another way, certain modifiers contextually go with certain words and sentiments, so why shouldn't state of the art systems lean on that fact, notwithstanding the strict application of grammar?


There was a story that a language lecturer had just explained how double-negatives were a sometimes a positive and sometimes an emphasised negative, and that likewise some languages used a double-positive to mean an negative. He claimed that English was not such a language, using double-positives only for emphasis, to which one of the students said “yeah yeah”.


An MIT linguistics professor was lecturing his class the other day. "In English," he said, "a double negative forms a positive. However, in some languages, such as Russian, a double negative remains a negative. But there isn't a single language, not one, in which a double positive can express a negative."

A voice from the back of the room piped up, "Yeah, right."

(https://www.ling.upenn.edu/~beatrice/humor/double-positive.h...)


To be fair, "yeah, right" is a sarcastic statement and linguistically the two words do not scope eachother so the positive statement is produced at the pragmatic level, double negatives are syntactico-semantic.


The issue you describe is typically called "Valence shifting" in this specific case "negation processing". It is of course a difficult problem to capture word-level sentiment and emotions but recent techniques in academic work obtain decent results.

However, industry typically relies on sentence- or document-level sentiment in, for instance, customer reviews with systems obtaining 80-90 F1-score which is very good. Often in e-commerce, aspect-based sentiment analysis is used in which a qualifying sentiment is attached to a target aspect, e.g. from a phone review systems extract: battery: large > positive; screen: dim > positive. You might have seen these types of reports in aggregate on review our e-commerce sites yourself.

It is however an ongoing field of research to process the scope of negation and uncertainty, but the field is making strides. State-of-the-art attention-based models obtain good scores on benchmark fine-grained sentiment analysis datasets such as the GoodFor/BadFor and MPQA2.0 of around 70% F1score [1]. This performance is nearly enough for commercial systems, depending on how you employ them.

1. https://link.springer.com/article/10.1186/s13673-019-0196-3


I think he means “ obvious, immediate, substantial economic value” to non technical people. It take little effort to imagine how to monetize it even for regular folks.


For many GPT1-3 is the first exposure to language modelling technology, which is great, but language modelling and pretraining is already widely used in nearly every NLP task even before GPT1. Any NLP engineer has used this, so it is kind of weird to claim large-scale pretrained LMs are revolutionised with GPT3.

Don't get me wrong the hype is largely deserved because of the performance and engineering/research/funding effort required. Plus cool demos and media marketing from OpenAI helps a lot in spreading awareness.

OpenAI has definitely revolutionised the marketing for language models, no doubt. Let's wait and see if they manage to do the same for the economic valorisation.


really? I am very impressed by GPT-3 but I still don't see any way to make money "obviously" out of it.

Maybe it can be an adjuvant to human in some tasks but then so could existing technologies too, I guess?


Do you want a computer that can reliably understand you and give you the best possible answer 99% of the time?

I think most people would go: yes! How much does it cost?


Oh, I thought we were talking about GPT-3


If you're running a search engine maybe.

But everyday people are used to getting 80% answers by search engines, I don't think many would pay for something that is "like google, but a bit better".

This seems the be current issue for many things that we used to pay for (dictionaries, encyclopaedias, newspapers, etc), and I'm not sure this would be different.


> "like google, but a bit better"

For simple queries like “who was the president of country X in YYYY” it’s probably just a bit better (if cached, of course, Google search is wicked fast).

But for more complex queries, Google is still remarkably dumb. Or downright insolent, ignoring my verbatim selection or quoted terms.

I’d pay good money for scarily smart search and a “grep for the web” service, that included JSON, CSS, JavaScript, comments, whatever. A toggle button for dumb/smart search


I would easily pay money for something that is like Google, but a bit better.


I wont


google's majority revenue is from search. So bit better than google, particularly if it is integrated into bing (as microsoft has invested in openAI) , and allows bing to capture market share from google, would be really lucrative


On the other hand, do you want a computer that gives you less accurate answers than more specialised tools but can spin them into something that looks like an essay? is less obviously monetisable than, say, already commercially available services like Alexa which incorporate NLP but don't rely exclusively on it.


GPT-3 is not capable of what you are describing.


I think regular folks are already pretty clued into how other things like text mining and sentiment determination have big markets considering how strong the backlash against tech politically is right now. The everyday public and by extension politicians seem reasonably capable of imagining how data can be monetized for ads, and they dont seem too happy about it.


Really? I have the opposite, not as positive impression. I hope you are right though.


I think it depends on how you define "clued in". They are aware of its existence but aren't just ignorant of how it works but outright apathetic and hostile to anything which goes against their personal narrative.

Just look at the "Google selling your data" being uncritically accepted when five minutes of thought would conclude it is the last thing Google would want (even a better search algorithim would find it hard to bootstrap on user base and comparable training time) or the casual John Yoo worthy torture of the definition of monopoly to include Goddamned Netflix when whining about FAANG monopolies. That level of generalization and stereotyping is like blaming the Amish for flying planes into the World Trade center because both are radical Sbraham religions.


From the "applied physics" department of John Hopkins university in Baltimore (last stronghold of the JASONs) south to Virginia and the Research Triangle Park area of North Carolina you will find people who know things about practical NLP systems that aren't in the open literature. They could tell you about it but they'd have to kill you.

Around Mumbai I know there is a crew that can really use UIMA, and there are other Indians I know who do intelligence and defense work.


Both DeepMind and OpenAI were founded on the premise that we are in an AI overhang. OpenAI, in particular, believes in scale. Scale will get us there based on the algorithms we have, such as the Transformer. With each new release, they add evidence that they were correct.

The call for legislation neglects that there exists a global arms race to make this technology succeed. Legislation in one nation will simply handicap that nation. Against that backdrop, legislation is probably unlikely among the nations already leading in AI.


> With each new release, they add evidence that they were correct.

Is it though? If the goal is human-level AI, or hell, even rat-level AI, the evidence is pretty convincing that you should be able to train and deploy it without requiring enough energy to sail a loaded container ship across the Pacific Ocean. Our brains draw about 20 watts, remember. This suggests to me that no, in fact, scale will not get us "there".

https://www.forbes.com/sites/robtoews/2020/06/17/deep-learni...


I don’t actually know if this is true, but the intuition I have is that this huge expenditure of energy is just the result of speeding through evolution. Our neural structures have evolved for hundreds of millions of years. The aggregate energy cost of that evolution has been enormous, but the result is a compact, hyper-efficient brain. Who’s to say that on the other end of this we’re not going to end up with the same in silicon?


Even if that were the case, it seems wasteful/pointless to have to go through all of biological evolution every time one wants to label images or generate some text.


You provide pre-trained models to then train further.

This happens now.


That 20 watts is to run the network. Our brain has had a billion years to work out details of the architecture and encode a lot of basic stuff as instinct (and it still sucks at a lot of things). You should be counting that energy cost as well - we didnt get from nerve nets to frontal lobes overnight.


This exactly, I am so tired of reading these posts online that ignore the billions of years the human brain took to evolve


*millions, not billions.

Earth is about 4.5B years old, life is about 3.7B years old, multicellular life (including life with neural nets) is about 600 million years old. I don't think the span from microbe to multicellular organism counts in brain evolution.


Do you want artificial intelligence or do you want energy efficiency? Personally, I think this work is about proving that we can create the former. Making it small and efficient comes later. That has been true of many advances in technology, and I see no reason why it should not apply here. I find it hard to believe that present energy consumption is evidence that we cannot create human or rat-level AI.


Training an AI in 2020 is best thought of as a capital investment. Like digging a mine or building a wind farm, the initial investment is very large but the operating costs are much lower, and in the long run you expect to get a lot more money out - a lot more value out - than you put in.

Training GPT-3 cost $5m; running it costs .04c per page of output.


If it is scaled up by 1000x as the article proposes, does that mean it will cost $40/page of output? Or does the additional cost just go into training the model?


If it's 100x from increased investment and 10x from short-term efficiency gains, yeah you'd expect $4/page. Model compression or some other tech might make it more efficient in the long-run.


Once the investment is recouped and a small margin rewarded, the value should be spread equitably among society.


Why only a small margin?


> , the evidence is pretty convincing that you should be able to train and deploy it without requiring enough energy to sail a loaded container ship across the Pacific Ocean.

Yes and airplanes use much more energy to fly than a bird. What that got to do with the airline industry?


yh but the plane uses much more energy so it's not really flying


Who cares how much power it needs? Plug it into a hydroelectric dam. A superhuman AI would surely provide higher ROI than the terrawatts used for smelting Aluminium.


You're missing my point. If it's possible to achieve general AI with incredibly minimal computational requirements, then this implies that current methods which rely on some sort of teraflop arms race to achieve better results are based on a fundamentally flawed model.


General intelligence in its biological form was achieved with hundreds of millions of years evolution, which required the "evaluation" of trillions and trillions of instantiations of nervous systems. The total energy consumption of all those individual organisms was many many orders of magnitude more than all of the energy that has been produced by the entirety of humanity.


The compute intensive methods are likely to deliver results much faster.

http://incompleteideas.net/IncIdeas/BitterLesson.html


Our brains are mostly an already-trained network though. Running a model that has been trained is the easy part.


I'm well aware of the distinction between training a model and running it. Look, GPT-3 has 185 billion parameters. Modern low-power CPUs will get you about 2GFLOPs/watt [1]. So even if all GPT-3 did was add its parameters together it would take multiple seconds on an equivalently powered CPU to do something that our brains do easily in real time. It's not an issue of processing power; an 8086 from 40 years ago easily runs circles around us in terms of raw computational power. Rather, it's that our brains are wired in a fundamentally different way than all existing neural networks, and because of that, this line of research will never lead to GAI, not even if you threw unlimited computing power at it.

[1] http://web.eece.maine.edu/~vweaver/group/green_machines.html


Birds are wired in a fundamentally different way than all our existing computers thus we will never have fly-by-wire, not even if we throw unlimited computing power at it.


Actually that's a great example. For centuries men labored (and died) trying to build ornithopters--machines that flap their wings like birds--under the mistaken impression that this was the secret to flight. Finally, after hundres of years of progressively larger and more powerful, but ultimately failing designs, the Wright brothers came along and showed us that flight is the result of wing shape and pressure differentials, and has nothing whatsover to do with flapping.

GPT-3 and whatever succeeds it are like late-stage ornithopters: very impressive feats of engineering, but not ultimately destined to lead us to where their creators hoped. We need the Wright brothers of AI to come and show us the way.


Perhaps our artificial representation of neurons are simply much less energy efficient than a biological neuron?


If someone created AGI that ran on 1KW you would deem it a rank failure by that metric (by a factor of 50x!).


The global arms race for AI has definitely started. Unfortunately, most states don't appear to be aware of this.


I've been pandemic-rewatching Person of Interest. It's really quite shocking how much more relevant it feels today compared to when it aired just a few years ago.

It's fun looking at things like GPT-3 and imagining how they could be used to build the surveillance AI at the heart of Person of Interest.

(If you haven't watched Person of Interest yet, here's my pitch for it: it's a CBS procedural where the hook is that an engineer built a secret, surveillance feed tracking AI for the government after 9/11 - but he cared about civil liberties, so he built it as an impenetrable black box. All it does is kick out the SSN of someone who is about to be either the victim or the perpetrator of a terrorist attack - which means government agents still have to investigate what's going on rather than taking the AI's word for it. "The Machine" also sees victims/perpetrators of violent crimes - but the government don't care about those. Finch, the machine's inventor, does - so he fakes his own death, hooks into a backdoor into the machine that gives him those SSNs and sets up a private vigilante squad to help stop the violent crimes from happening. So that gives you the "case of the week". Only it's actually an extremely deep piece of philosophical science fiction disguised as a case-of-the-week procedural, and as time goes on the plots become much more about AI, the machine, attempts to build rival machines, AI ethics and so on. It's the best fictional version of AI I've ever seen. The creative team later worked on Westworld.)


I honestly don't think the show has much to do with AI at all and is more like a retelling of the Greek classics in a sci-fi wrapping, which is actually something that comes up in the show at several points excplicitly.

The AIs in the show very quickly turn into godlike characters with antropomorphic personalities and the real world issues of AI such as surveillance, economics and so on are all dealt with in very shallow fashion. I had the same issues with Westworld too. It turns from an AI premise into a classical Christian morality tale very fast. ("we need to suffer to become conscious").


One of the things I loved about the show is that different characters have different philosophies concerning AI, and they argue about them. Nathan v.s. Fitch. Fitch vs. Root. Control, Greer - for the most part the show tried to give some depth and background to their thinking around the implications of what they were responsible for.

Way smarter than you would expect from a CBS procedural!


That's Jonah Nolan's show right?


Yup, he did it before Westworld.


I thought the overhang was going to be along the lines of the following, whether realistic or not:

-GPT-3, as is, should be the inner loop of a continuously running process which generates 1000s+ of ideas for "how to respond next" to any query, with a separate network on top of it as the filter which cherry-picks the best responses (as humans are already doing with the examples they are posting)

-Since GPT-3, as is, can already predict both sides of a conversation, it can steer a conversation toward a goal state just like AlphaGo does by evaluating 1000s+ of potential moves, lots of potential responses and counter-responses until it finds the best thing to say in order to get you to say what it "wants" you to say.

It seems ready to go as the initial attempt at the inner loop of both of these tasks (and more) without modification or retraining of the core network itself, no?


I'd love to see what could be done with GPT-3 as part of a GAN. Text compression/summary, maybe?


I was also thinking something like this. GPT-3 should be the internal monologue, the subconscious soup of words constantly exploring random thought alleys, and there should be another layer on top of it to bridge it with the outside.


GPT-3 is the first AI system that has obvious, immediate, transformative economic value.

I think the jury is still out on this one. It certainly seems powerful, it's doing interesting things, and it's better in many ways than any system that has come before. But there's a different between exciting demos and transformative economic value.

It's too soon to be sure, but to me, the most interesting question is whether any valuable startups will be built on top of GPT-3. Some leading indicators before that are whether useful products are built on GPT-3, and whether early-stage startups built on GPT-3 get seed investment. I'm not aware of any of these yet but maybe latitude.io counts as one.


In Kernighan and Pike's "The Practice of Programming" there's a chapter that covers the implementation in different languages of a random text generator using markov chains. It's a nice exercise and a lot of fun to play with.

I'm guessing that not many people have read that book, because I'm seeing here and elsewhere even technical people talking about GPT-3 as if it's heralding the imminent advent of SkyNet. I get that transformers have a somewhat longer attention span than markov chains, but it's still a statistical language model. It can't even do the kind of planning or reasoning that early AI demos like SHRDLU could.


It's ridiculous to equate this to a traditional Markov chain language model. Here's something a Markov chain certainly cannot do:

Human: I want to test your creativity. Please invent a new word and give its meaning.

GPT-3: Ok. Um... Tana means to hit someone with the intention to wound them.

Human: Please use the word tana in a sentence.

GPT-3: You are about to tana the man attacking you.

Human: Speak like a dwarf.

GPT-3: I ain't talkin' like a dwarf.

https://www.reddit.com/r/MachineLearning/comments/hvssqn/d_g...


Right, I admit that I don't know the first thing about ML, so I tried an experiment.

Consider a language with the tokens "{[()]}" and the following grammar:

S := S S | '{' S '}' | '[' S ']' | '(' S ')' | <empty>

That is, "[()]" and "[]()" are valid sequences, but "[(])" or "))))" aren't. A child would quickly figure out the grammar if presented some valid sequences.

I generated all 73206 valid sequences with 10 tokens and used it as input to the RNN text generator code at http://karpathy.github.io/2015/05/21/rnn-effectiveness/. After 500,000 iterations I'm still getting invalid sequences.

Am I doing something stupid, or is a RNN text generator weaker than a child (or a pushdown automaton)? Is GPT fundamentally more powerful than this?


GPT-3 can generate well-formed programs, so yes, it does things well beyond this complexity.

> After 500,000 iterations I'm still getting invalid sequences.

How frequently? If it's only the occasional issue it might be down to the temperature-based sampling that code uses, which means it will, with some small probability, return arbitrarily unlikely outputs.


How can it do that? Did it read “tana” and the meaning somewhere?


I suspect people overestimate the intelligence because they just can't grasp how much data it's ingested or don't have a visceral sense of what an ocean of data can contain. There's a saying that "quantity has a quality all its own".


I don't think this is an overestimation of intelligence. That ability is itself intelligence.


It's a good book, and one most programmers should read early on (it grows less useful over time), but to think of this as only an incremental improvement over markov chains is underselling the advance. The technology is different, and can scale to much higher levels of capability. AGI levels? Almost certainly not. But it's passing usefulness thresholds so things can progress elsewhere.


>to think of this as only an incremental improvement over markov chains is underselling the advance.

Erm, citations needed. It's a giant, inefficient and shitty KNN model, which is capable of mimicking markov chains. Wonderful marketing achievement and not much else.


https://www.gwern.net/GPT-3 (Edit: in case it's not clear, I suspect if you give an honest perusal of that page, and the pages it links, and the pages they link, you'll come away with a different opinion.)


I'm totally with you. I find GPT-3 incredibly impressive, but people are acting as if GPT-3 is a harbinger of AGI.

This is especially obvious on stuff like lesswrong, where AI is a big part of what they talk about. I tend to agree with the LW/SSC crowd about the negative effects of AGI, but they are being so hyperbolic about GPT-3.


Nah we’re in for the next AI winter. GPT-3 shows how much energy is needed to perform a nice trick with current technology. We mostly have reached the limits of the technology. Investing more compute power for a few percentage points more Precision is not going to bring the technology forward.


2017 SOTA on Penn Treebank was 47.69 perplexity. GPT-3 is at 20.5. AI has already been productized on consumer devices through Siri, Google Assistant, speech detection, speech generation, textual photo library search, similar data augmentations for web search, Google Translate, recommendation algorithms, phone cameras, server cooling optimization, phone touch screens' touch detection, video game upscaling, noise reduction in web calls, file prefetching, Google Maps, OCR, and more. DLSS alone justified continued investment by NVIDIA. NVIDIA Ampere will be ~6x as fast at running consumer-targeted models as Turing, given raw throughput increases compounded with sparsity and int8 hardware. A huge number of research threads around AI have direct applicability to large tech companies.


I'm not arguing that current machine learning technologies are not useful. I'm just arguing that progress is based on increasing some metric, usually depending on a trade-off of computation. This can even make ML-techniques applicable to some new fields, but it's not what is holding back autonomous driving, the often touted parade example which also brings in a lot of employment for machine learning.

This article clearly sits on the peak of inflated expectations in the hype cycle.

https://en.wikipedia.org/wiki/Hype_cycle


It's not just that you're not arguing it isn't useful; as far as I can tell, neither of your comments contain an argument against ML at all. I have nothing to meaningfully argue against.

ML is undergoing a Cambrian explosion of use-cases (see my prior comment), almost all of this over an incredibly small time period, progress is accelerating, and many of these use-cases are incredibly high value. Scale is not proving a major stopper; Google's MoE experiments show that huge models are productizable, and small models work plenty fine too in restricted places, to the point where they're literally used to parse touch screen sense data in phones.

If you want to claim we're in for another AI winter, you need a vastly stronger argument than ‘something something hype cycle’.


Agreed. Nearly every AI startup or idea has disappointed or failed. We’re spending the equivalent of billions of dollars on something dumber than a rat in most cases.


If you look at the paper doesn't it scale well across many different metrics?


It's a one time investment though, and not that large relatively speaking. Is is that much more costly and less rewarding than other investment opportunities?


I wonder if at some point the amount of extra data necessary to achieve an n-fold improvement will outstrip what we can provide.

I think the time for AI legislation is now - before FAAMG deploys something like the next-gen of GPT-3. Of course with the legislative lag that exists even for decade-old tech I don't have the highest confidence in this being achieved by a federal government in the state it is in now.


On that data point: I wonder if anyone can comment on how much useful training data we could get out of generating text based on knowledge graphs/databases that we have. You can construct an awful lot of sentences out of just a few facts (e.g. weights of various classes to generate sentences like: "x's are heavier than y's, but not as heavy as z's"). All the variations would contain the same information (of subsets of it), but the same could be said of lots of text online. Obviously this is an inefficient way to incorporate the databases into a GPT-like model, but it might make sense economically given the race that is now playing out - just shoehorn it in or you'll be left behind (at least in the short term) by those who do. "We can work out how to make it efficient after we're rolling around in cash."

The knowledge databases could be used to generate what would essentially be "word problems" (in math classes), starting with simple things like "If I put three marbles in a cup, and then I take one out, and each marble weighs 20g, then the remaining marbles weigh 40g in total" and moving on to progressively more complex ones.

If that were to happen, then you'd see companies employing people to create templates which essentially convert databases into sentences/paragraphs, which can then be consumed by the GPT-like model.

It seems like this data would need to be used in a sort of pre-training step though, because you want the model to encode all the relationships, but you don't want it to learn to generate these types of concrete sentences, specifically.


As blueeyes has already pointed out "Legislation in one nation will simply handicap that nation." I don't have a lot of faith in our legislators ability to legislate safety without relegating us to an AI backwater.


You're right. I think ideally the legislation should be international. Maybe something like the Washington Naval Treaty that set an upper limit on the tonnage and armament of new battleships. Or perhaps more aptly something akin to SALT I & II where older models are taken offline to avoid derelict AI systems from falling into malicious hands and to keep the number from growing out of control. Although this parallel is somewhat weak considering the capabilities of one advanced model are more valuable than 10x models of the last generation.

Theoretical wishful thinking, I suppose, but I strongly believe that corp/govt scale ML research should be treated like advanced weaponry because it isn't a matter of if but when AI will be weaponized (whether the flavor of warfare is physical or informational).

Although of course as with weapons treaties - the major powers would likely tend to be selective in what they commit to limiting themselves in.


The world couldn't even come together on controlling 3D printed weaponry, there's no hope for an arms treaty for AI right now. The "it's not feasible to regulate even if you tried" stance applies too -- you can restrict central actors without much difficulty, and that would work for AI just as well as it works for battleships, but there's a lot of distributed compute whereas there's not a lot of distributed shipyards. Like, you just have to follow what's been done with anime image nets to see that something like GPT-3 is possible for a distributed worldwide group to achieve and is not limited to firms or governments.

Maybe when we have a disaster directly attributable to AI, nations can get on-board with something like the BWC and CWC. Until then, be even more pessimistic. (If you want a fun if rather dry book to read on material technology developments that were in the pipeline a couple decades ago, some of which have come to fruition, as well as some policy recommendations for the technologies that aren't generally good, check out Jürgen Altmann's Military Nanotechnology.)


>The world couldn't even come together on controlling 3D printed weaponry

Beg pardon? Plastic guns have been banned in the US since 1988 https://en.wikipedia.org/wiki/Undetectable_Firearms_Act

I assume other countries have similar bans.


As a small amount of metal can be added at the end to make the weapon 'legal', that act does little to address the numerous¹ problems beyond being able to sneak a gun past airport security. Hardly an important milestone in controlling anything. It didn't even affect any gun in existence at its time.

But more generally, as we all know, a ban without provisions for enforcement is useless. Compare to the CWC (Chemical Weapons Convention) which I point to as one of the best pieces of international "coming together" via treaty. It includes requirements that member countries submit to inspections from its enforcement body (OPCW) and furthermore that countries can request the OPCW inspects another member country if they suspect non-compliance. It also includes restrictions on transfer of various chemicals in order to incentivize non-member countries to become members so they can purchase chemicals for industrial purposes from other members.

¹ and bigger, if you're modeling this from assumptions where it's a problem at all -- not everyone thinks it is, "an armed society is a polite society" etc.


Legislation? International treaties?

An AI-risk maximalist would believe AI is a near-term existential threat, with the prospect of total human extinction. In that scenario, the final backstop measure to a rogue country engaging in AI research is using nuclear weapons.

This... obviously... would be very bad. If it escalated to a full nuclear war, it would kill billions of people. But it would leave survivors, who wouldn't be interested in, or be able to, pursuing AI for decades or centuries. Better than the alternative.


One thing I don't hear a lot of people talking about are ML/AI systems in the hands of government agencies. We know that the military and NSA are often ahead in many technologies but when it comes to AI the assumption seems to be that the industry is moving faster than the government. Is that really a safe assumption?

The goverment is openly using autonomous systems to pilot drones, but what else are they leveraging AI for? Threat analysis? Logistics? Weapons optimization? PsyOps?

The DoE is openly a very large consumer of GPUs. What about the military?


You can get a glimpse by scrolling websites like: https://www.darpa.mil/opencatalog?ppl=view200&sort=title&ocF... [.mil] and looking at DARPA and Office of Naval Research sponsored ML/AI research. The military has been deeply involved with ML/AI research since its inception, and it is near impossible to avoid first - or second degree involvement, if active in ML/AI.

The military wants: automated chat agents/web users that can be sent to dark web markets and hacker IRC channels and report back intelligence. Common sense inference from security and drone footage: predict who the killer is when watching a movie. Author deanonimization and cross-device tracking. Global-scale 99.9%+ accurate face detection.

The Dutch Intelligence Agency organizes a yearly competition with difficult codes to crack. [1] It is rare for someone to answer all questions correctly. The answers require logic, creativity, common sense, linguistics, causal inference, spatial reasoning, expertise, analysis, and systematic thinking. I bet the military would be mighty interested in an automated problem solver for that. And mighty scared some other country gets there first.

[1] https://www.aivd.nl/onderwerpen/aivd-kerstpuzzel


> Is that really a safe assumption?

Not at all. The government can throw billions of dollars at a problem that, if solved, will never turn a profit or immediately benefit a business.


Let me flip this argument on its head. Consider this: About 5 years ago several key SV people including Sam Altman, Peter Thiel and Elon Musk became suddenly very concerned about AI ethics and started OpenAI. What if they, with this insider status, had already seen a GPT-3 like system at Google, Facebook, Baidu or wherever and its capabilities for political and social manipulation so concerned them that they started OpenAI in an effort to bring this tech out of the shadows and into the sunlight so we could debate it and regulate it. GPT-3 might not be a state of the art breakthrough. It could be just catching up with where the big tech companies were 5 years ago so that we can finally see what they are capable of. Corporate secrets are a normal part of doing business and maybe the tech companies didn't like the PR they would have gotten from publicizing something like this. Remember the blowback from Google's project that called business for their store hours? They already struggle with regulators across the world as it is. Do we really believe that little OpenAI is so much farther ahead of Google like the posted article posits?


You don't need secrets for your theory of insiders to work. What Thiel and Musk saw was DeepMind. Then they invested in it. Altman acted later.

And the vision of what AI was becoming was voiced much earlier by Yudkowsky, whose Singularity Institute received funding from Thiel.

If anything, they heard what a few prophets were shouting. They saw some early demos in a startup pitch. They responded and DeepMind's work soon became as public as AI research is. That is to say, most people ignored it until the Google acquisition.


Yudkowsky's outfit is the Machine Intelligence Research Institute, not the Singularity Institute. The latter might be Bostrom? Can't recall.


MIRI was, until 2013, called the Singularity Institute for Artificial Intelligence. Apparently the name change was part of a deal with Singularity University to avoid brand confusion. Announcement here:

https://intelligence.org/2013/01/30/we-are-now-the-machine-i...


Thank you.

Thiel was an early backer of SI and attendee of the Singularity Summits that ran from 2006-2012.

https://en.wikipedia.org/wiki/Singularity_Summit

Soon after that, awareness of AI and superintelligence went mainstream, and we got FHI, FLI, etc.

I don't know if Thiel backs MIRI as he did SI. Arguably, he doesn't need to. He made his money on DeepMind and helped trigger a larger movement, and other institutions with a lot more resources, like Alphabet and MSFT, carry forward the torch.


Thiel wasn't an attendee, he was one of the people running the conference


He was attendee and sponsor and running the conference.


from Gwern, @ https://www.gwern.net/newsletter/2020/05: "This year, GPT-3 is scary because it’s a magnificently obsolete architecture from early 2018, which is small & shallow compared to what’s possible3, with a simple uniform architecture4 trained in the dumbest way possible (unidirectional prediction of next text token) on a single impoverished modality (random Internet HTML text dumps5) on tiny data (fits on a laptop), sampled in a dumb way6, and yet, the first version already manifests crazy runtime meta-learning—and the scaling curves still are not bending!"

It's probably not a state-of-the-art breakthrough at this point. Who knows what OpenAI has done in the intervening two years?


Sounds like a nice conspiracy, but realistically, how could they hide something like that? Presumably there are hundreds or more employees working on this. If Elon Musk et al. heard about it 5 years ago, this must be one of the best kept secrets in recent history.


It's best to assume that Google et al. Are not hiding anything "huge". What you see publicly is what there is. Deepmind used to be years ahead, but now they appear to have been, in some very important respects at least, leapfrogged by OpenAI, initially an imitator. It would be interesting to know if Deepmind have indeed squandered their lead away, and if so, why it happened.


I'm not sure this qualifies as a conspiracy theory, maybe more an open secret. There's not anything out of the ordinary about a company only publicizing things that are good for PR. I remember reading a number of articles about 5 years ago hinting that Deepmind was onto something much bigger than classifying cat pictures. NDAs are usually quite effective at protecting trade secrets. And if you go back and read interviews from around the time of OpenAI's founding and read between the lines, something clearly had concerned Musk and Altman enough to get invested in AI ethics. I remember thinking at the time that they didn't seem vaguely concerned, but specifically concerned, very much like they had seen a demo that spooked them.

Here's an example: https://www.theverge.com/2016/6/2/11837566/elon-musk-one-ai-...

Read past the fluff and the skynet. He's telling us Google scares him and makes him concerned for democracy.


> how could they hide something like that?

Very carefully. I mean, that's not much of an argument. Lots of stuff is successfully kept secret. The US managed to keep a lid on their surveilance for decades (iirc) before the lid got blown on that, and people used to give the same argument you are in that context, too.

What's the alternative? Do you think megacorps never keep illicit things under wraps for extended periods of time?


There is a formula for working out how long it will take for a conspiracy theory to be made public based on how many people are involved in it.

I guess you could use it in reverse and produce an upper limit on how many people could be involved in a conspiracy if you assume that it has been secret for five years.

That said, Elon Musk gives every impression of being a massive chatterbox who can’t keep his mouth shut even when its the SEC threatening to take Tesla away from him, so I very much doubt any conspiracy involves him.


for those interested, the equation for estimating if a CT would last was published by a Dr. Grimes in 2016[0], it's pretty interesting :)

[0] https://journals.plos.org/plosone/article?id=10.1371/journal...


I would love to see a follow up to that. From what I recall, the PRISM program they used in that study wasn't the only one of its kind, and anything from the 80s or earlier wasn't included. They also didn't make mention of any of the CIA's old projects from the 70s / 80s that get declassified every few years. I wonder how much those would skew the results.


The paper is definitely a worthwhile read; it would be a fun exercise to go through other historical examples of whistleblowing and unmasked conspiracies (e.g. the panama papers, Chelsea Manning's wikileaks disclosures, MKUltra, &c) and see how well the authors' parameter estimates hold up.


and no mention of the manhatten project at all? , we built secret cities for that


True. Many thousands of people have security clearances in the US. The penalty for breaking security clearance is harsh and many have done so but still, there have to be tons of secrets kept by the state. Not a big stretch to imagine companies convincing people to keep secrets.


Most companies default to secrecy. What's the recipe for Kentucky fried chicken? What problem is holding Waymo back specifically, right now? How much advertising business, in dollars, does Facebook take from political PACs. Who will be Biden's VP pick? What will Apple's next iPhone look like? What new streaming show is Disney about to reveal? Capitalism runs on information asymmetry.


the secrets being kept are the ones that aren’t very interesting.


Nick Bostrom's 2014 book Superintelligence is the reason that many big name figures started to take the threat seriously.


Five years ago? Peter Thiel hosted the Singularity Summit in 2006, and a major premise of that summit was AI Ethics...


>"bring this tech out of the shadows and into the sunlight so we could debate it and regulate it"

Ahh... I'm not so sure; see the comment by blueyes. Had it been OpenAI's goal to engage in debate and regulation for this technology, they would have been vocal about that aspect of their work already.


That's fair, and looking at some of the founding statements it seems they were much more interested in getting the tech out there and available to everyone equality and didn't seem to be big believers in regulation. I think I am projecting my own belief of the importance of regulation onto this.


"Flipping this argument on its head" = repurposing a detailed, quantitative cost analysis to argue we're not in overhang.

"Not reading OP and spouting a non-quantitative conspiracy theory" = what you did.


You are absolutely right! I did read TFA, but it is knowledgeable and well considered so I can't address it specifically. It is still fun to challenge one of its assumptions, namely that Google is now playing catch-up. It's standard devil's advocate stuff, but seemed a fine vein to mine. Yes, Google is probably now behind in language generation vs OpenAI, but what if they weren't and how far might that go?


>so we could debate it and regulate it.

but you'd have no way of enforcing any treaty so that's a moot point and they would know this.

I think making people aware of the importance of the control or value loading problems is a much better use of efforts.


Or maybe, just maybe, it was and still is possible with public evidence and independent thinking to understand the importance of AGI and AI ethics.


u/Gwern's [explanation on why Google didn't produce GPT-3 earlier](https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-...):

As far as I can tell, this is what is going on: they do not have any such thing, because GB and DM do not believe in the scaling hypothesis the way that Sutskever, Amodei and others at OA do.

GB is entirely too practical and short-term focused to dabble in such esoteric & expensive speculation, although Quoc's group occasionally surprises you. They'll dabble in something like GShard, but mostly because they expect to be likely to be able to deploy it or something like it to production in Google Translate.

DM (particularly Hassabis, I'm not sure about Legg's current views) believes that AGI will require effectively replicating the human brain module by module, and that while these modules will be extremely large and expensive by contemporary standards, they still need to be invented and finetuned piece by piece, with little risk or surprise until the final assembly. That is how you get DM contraptions like Agent57 which are throwing the kitchen sink at the wall to see what sticks, and why they place such emphasis on neuroscience as inspiration and cross-fertilization. When someone seems to have come up with a scalable architecture for a problem, like AlphaZero or AlphaStar, they are willing to pour on the gas to make it scale, but otherwise, incremental refinement on ALE and then DMLab is the game plan. Because they have locked up so much talent and have so much proprietary code and believe all of that is a major moat to any competitor trying to replicate the complicated brain, they are fairly easygoing.

OA, lacking anything like DM's long-term funding from Google or its enormous headcount, is making a startup-like bet that they know the secret: the scaling hypothesis is true and very simple DRL algorithms like PPO on top of large simple architectures like RNNs or Transformers can emerge and meta-learn their way to powerful capabilities, enabling further funding for still more compute & scaling, in a virtuous cycle. And if OA is wrong to trust in the God of Straight Lines On Graphs, well, they never could compete with DM directly using DM's favored approach, and were always going to be an also-ran footnote.

While all of this hypothetically can be replicated relatively easily (never underestimate the amount of tweaking and special sauce it takes) by competitors if they wished (the necessary amounts of compute budgets are still trivial in terms of Big Science or other investments like AlphaGo or AlphaStar or Waymo, after all), said competitors are too hidebound and deeply philosophically wrong to ever admit fault and try to overtake OA until it's too late. This might seem absurd, but look at the repeated criticism of OA every time they release a new example of the scaling hypothesis, from GPT-1 to Dactyl to OA5 to GPT-2 to iGPT to GPT-3... (When faced with the choice between having to admit all their fancy hard work is a dead-end, swallow the bitter lesson, and start budgeting tens of millions of compute, or between writing a tweet explaining how, "actually, GPT-3 shows that scaling is a dead end and it's just imitation intelligence" - most people will get busy on the tweet!)


ʸᵉˢ

Explainability in AI is really overlooked and often skipped over as there is little progress in this area. GPT-3 is essentially GPT-2 + tons of data, compute and parameters and yet it still cannot explain itself as to why it can generate 'human-level' text, much like how AlphaGo can't explain why it performed move 37. Not discrediting these achievements, but explainability is just as important in these AI models.

Once you have an AI-based 'auto-pilot' in any vehicle, the importance of AI explainability will haunt manufacturers when the regulators would want them to explain why this 'AI' took this decision and they're unable to explain this.

I hope GPT-4 isn't just going to be GPT-3 + 1000x the data. Otherwise nothing would have changed here other than the parameters and data.


The easy solution is to copy what the human brain does: just make up something plausible. There's pretty good evidence that we don't have great introspective access to much of our own internal processing. We just paper it over as "intuition" or "judgement."


What does explainability mean for superhuman AI?

We already see this in superhuman stock algorithms. You can "debug" them, in the sense that for a given trade, it can tell you what signals provoked it. But they don't make any sense: it saw rainfall in the Amazon tick up, the price of beef in Russia tick down, and the UK call a snap election, so it bought more GE stock.

You could... theoretically... write a story that connected those dots, but it will either be facile or nonsensical. That's because the model of the market the algo has is bigger and more complete than anything a human can have. It's drawing a straight line through some upper-dimensional manifold that you can't comprehend.

It can't explain what it's doing to you anymore than you can explain "algorithmic stock trading" to a three year old child. You can say what the outcome was, but you can't explain it in such a way that the kid could replicate the performance.


> The current hardware floor is nearer to the RTX 2080 TI's $1k/unit for 125 tensor-core TFLOPS, and that gives you $25/pflops-d.

It's definitely true that the RTX 2080 Ti would be more efficient money-wise, but the Tensor Cores are not going to get you the advertised speedup. Those speedups can only be reached in ideal circumstances.

Nevertheless, the article as a whole makes a very good point. The thing that is most scary about this is that it would become very hard for new players to enter the space. Large incumbents would be the only ones able to make the investments necessary to build competitive AI. Because of that, I really hope the author isn't right - unfortunately they probably are.


OpenAI is kind-of a new player. Well-funded, but still - there's a lot of money available for this kind of exponential opportunities.


> GPT-3 is the first AI system that has obvious, immediate, transformative economic value.

What is it’s economic value? What does it transform? I’ve been trying to figure that out since I heard about it.

Anyone have any ideas?


Theory based on some output I've read:

It's good enough to actually start replacing a lot of customer service jobs. Not just being a shitty annoyance like current bots but being useful in that it will be as flexible as a human, directing you to good help via vague terms, potentially being smart enough to refer you higher up if necessary.

Getting rid of all those screening call center employees is potentially very lucrative.


It think it is all right except for the A.I. part.

GPT-3 is taking a graph-structured object ("language" inclusive of syntax and semantics) over a variable-length discrete domain and crushing it into a high-dimensional vector in a continuous euclidean space. That's like fitting the 3-d spherical earth onto a 2-d map; any way you do it you do violence to the map.

I think systems like GPT-3 are approaching an asymptote. You could put 10x the resources in and get 10% better results, another 10x and get 1% better results, something like that.

You might do better with multi-task learning oriented towards specific useful functions (e.g. "is this period the end of a sentence?") but the training problem for GPT-3 is by no means sufficient for text understanding.

GPT-3 fascinates people for various reasons, one of them being almost good enough at language, lacking understanding, faking it, and being the butt of a joke.

If GPT-3 were a person with similar language skills and people blogged about that person, mocking it's output, the way we do with GPT-3, people would find that cringeworthy. Neurotypicals welcome it as one of their own, and aspies envy it because it can pass better than they can.

At $2 a page it can replace richmansplainers such as Graham and Thiel who never listen. It's not a solution for folks like like Phillip Greenspun who read the comments on their blogs.

For that matter, it may very well model the mindlessness of corporate America: if you accept GPT-3 you prove you will see the Emperor's clothes no matter how buck naked he is. AT&T executives had a perfectly good mobile phone business: what possessed them to buy a failing satellite TV business? Could GPT-3 replace that "thinking" at $2 a page? Such a bargain.


Why do you think that systems like GPT-3 are approaching an asymptote? Most people I've talked to say the opposite; they wouldn't have expected GPT-3 could be so much better than GPT-2 with no major additional breakthroughs.


It's just structurally wrong for the domain.

For instance, understanding language requires some of the capabilities of a SAT solver. This was something everybody believed in 1972, but today is denied.

Fundamentally "understanding" problems require the ability to consider multiple alternative interpretations of a situation, often choose one or work with the incomplete knowledge you have.

Back in the 1970s we had intellectually honest people like Richard Dreyfus writing books like "Things Computers Can't Do" that describe many specific ways the architecture at the time fall short. People on GPT-3 are working in a way that is academically valid (able to make results that are meaningful to a community) but from engineering it is like building a bridge with one end or a tall tower that carries no load.

GPT-3 has a structural mismatch with the domain it works in. Unlike early medical diagnosis systems like MYCIN, it is never a doctor, it just plays one on TV and it does the "passing for neurotypical" terrifyingly well.

The secret of GPT-3 is that people want to believe in it. Somebody will have it generate 100 text snippets and they will show you the three best. Your mind makes up meaning to fill up for its mindlessness. When this was going on with ELIZA in 1965 people quickly understood that ELIZA was hijacking our instinct to make meaning.

For some reason people don't seem to have that insight today, and it bothers me why that is. Back in the 1980s they had a lot of fear about compressing medical images because it could lead to a wrong diagnosis. Today you see articles in the press that are completely unquestioning that a neural network that has been trained to hallucinate healthy and cancerous tissues will always hallucinate the right thing when you are looking at a patient.


> People on GPT-3 are working in a way that is academically valid (able to make results that are meaningful to a community) but from engineering it is like building a bridge with one end or a tall tower that carries no load.

To me it seemed like the opposite. They are essentially working without any hypothesis of how their model actually works, without any model of the way it actually learns or the way it produces the results that it does, and instead placing blind trust in various metrics that are improving.

They are treating this as an engineering problem - how can we make the best human-sounding text generator - and not like a traditional research problem. GPT-3 has not taught us anything about anything except "how to generate text that seems human-like to humans". We have no firm definition of what that means, we have no idea of why it works, we have no idea of any systematic failures in its model, we know next to nothing about it, other than its results on some metrics.

Imagine the same applied to physics - if instead of inventing QM and Relativity or Mechanics, physicists got it in their head to try to feed raw data into a black box and see how well it predicts some observed movements.

In fact, this would be a pretty interesting experiment: how large would a deep learning model that could accurately predict what mechanics predicts get, given only raw data (object positions, velocities, masses, colors, surface roughness, shape, taste etc.)? Unfortunately, I don't think anyone has been interested in this type of experiment, because it is not useful from an engineering (or profit) perspective.


>Imagine the same applied to physics - if instead of inventing QM and Relativity or Mechanics, physicists got it in their head to try to feed raw data into a black box and see how well it predicts some observed movements.

In fact, this would be a pretty interesting experiment: how large would a deep learning model that could accurately predict what mechanics predicts get, given only raw data (object positions, velocities, masses, colors, surface roughness, shape, taste etc.)? Unfortunately, I don't think anyone has been interested in this type of experiment, because it is not useful from an engineering (or profit) perspective.

Isn't that what googles alphafold is doing pretty much?

https://deepmind.com/blog/article/AlphaFold-Using-AI-for-sci...

and it seems GPT-3 formed concepts related words together without being asked, its not picking the next best word strictly as a matter of statistic probability. So why wouldn't that apply to physics simulations / chemistry etc?

feed it chemical formulas and balancing equations from old chem 101 textbooks and it will fill in the blanks and start teaching itself how those things relate just by being corrected enough, then you can see if it has any predictive value.


I think both of your points are solving different problems that what I was suggesting.

My point is that an interesting scientific question is: "is the huge size of the GPT-3 model intrinsic to the problem of NLP, or is it an artifact of our current algorithms?"

One way to answer that is to apply the same algorithms and methods to mechanics data generated from, let's say, classical mechanics; and compare the generated model size with the size of the classical mechanics description. If the model ends up needing roughly the same amount of parameters as classical mechanics, then that would be a strong suggestion that NLP may intrinsically require a huge model as well. Otherwise, it would leave open the hope that and understanding can be modeled with fewer parameters than GPT-3 requires.

Your examples are still in this realm of engineering - trying to apply the black box model to see what we can get, instead of studying the model itself to try to understand it and how it maps to the problem it's trying to solve.


I think most people are still aware of the problem. I've seen a few people go off the rails, but most people have been well aware that "GPT-3 is having a conversation with me!" isn't itself particularly interesting.

On the other hand, we should acknowledge that humans are also structurally wrong for most of the domains we work in. A general-purpose neural network isn't a great tool to diagnose cancer, certainly - but it doesn't have to be great to exceed some radiologist's general-purpose light detectors. I think GPT-3 starts to edge into the territory of demonstrating Dreyfus was substantially wrong, and recognizably computer-like architectures are fully capable of doing abstract reasoning.

(That's not to knock on Dreyfus! Other voices in his era were optimistic to an absurd degree, and "come on guys our computers aren't that smart" was a very necessary response.)


> and crushing it into a high-dimensional vector in a continuous euclidean space. That's like fitting the 3-d spherical earth onto a 2-d map; any way you do it you do violence to the map.

Adding to this, most metrics can't be embedded well in euclidean space. Even something as simple as 4 nodes in a loop using the shortest path as your metric -- there's a minimum amount of error for any embedding into any euclidean space, and it's well above 0.

It's a bit surprising to me that we've hobbled along this far shoving square pegs into round holes wrt NLP since fundamentally that can't be fixed with more parameters and bigger coprocessors. It seems that some interesting features of natural language are actually euclidean.


> Adding to this, most metrics can't be embedded well in euclidean space. Even something as simple as 4 nodes in a loop using the shortest path as your metric -- there's a minimum amount of error for any embedding into any euclidean space, and it's well above 0.

Are you talking about a 2d Euclidean space, or about any number of dimensions?


Any number of dimensions actually :)

For that example it suffices to show it in 3D since any euclidean embedding of n+1 points can be isometrically embedded in an n-dimensional space, so if a 3D embedding with error E doesn't exist then neither does an ND embedding for any N>3.

Finding the minimum error requires a tad more effort, but it's not too bad to show that no embedding has 0 error:

Take a cycle a->b->c->d->a where every edge has length 1. Suppose a 0-error embedding exists. Points (a), (b), and (c) must be embedded colinearly, and then the only possible location for point (d) satisfying the distance requirements |a-d|=|c-d|=1 is precisely wherever we placed point (b), but then point (d) can't possibly have distance 2 from point (b).

By itself that doesn't show that infinitely small errors are impossible, but that assertion is also true in practice.


Thanks!


He's talking about curvature.


no, global topology, which is related to curvature.

For instance, I can go from Sydney to Sao Paulo going either East or West on the territory, but on the 2-d map you can only draw one path. You can map one point on the territory to multiple points on the map but that is itself a mismatch with the territory.

A model like WordNet, for instance, loses information about out-of-dictionary words. Words like "if", and "and" and "bit" are in the dictionary, maybe 95% of the words in your text are in the dictionary, but 50% of the meaning is in the out-of-dictionary words. There are things like FastText that do a little better (have a fighting chance of guessing at latin and greek words smushed together) but still make mistakes at an early phase of analysis which can't be recovered at later stages of analysis.

For a domain such as medical notes (say abstract of a medical case study) you might want to answer some question like "Did the patient die?" or "What code would I bill insurance for this?" and much more than half the time an embedding throws out an piece of information which is essential to computing the right answer as opposed to guessing at the answer.


GPT-3 is a search engine pretending to be an AI.


Can anyone provide the background or framework of how I should see the business value of gpt3.

Are there businesses that have a tremendous needs for the possibilities it provides?

I've seen use cases that some NLG companies provide like sports and stock summaries but what world should I imagine where this is transformative?


> so dropping $1bn or more on scaling GPT up by another factor of 100x is entirely plausible right now.

I'd note it's rare that cost of scaling a computing project is a linear growth function.

100x-ing an AI project could be 1100x cost.


For AI on this scale, there are two important costs. The cost of compute, and the cost of engineer salaries. Scaling the compute is probably a bit above linear due to various overheads, but the cost of engineer salaries is far less than linear, so I would expect the total cost of scaling large-scale AI to be sublinear.


I would expect sublinear scaling in compute cost. Learning curve effects are strong in DL: https://cdn.openai.com/papers/ai_and_efficiency.pdf The more runs you do, the cheaper they get. Plus, OA is no longer running on rented V100s, they have their own MS Azure supercomputer, remember (in fact, GPT-3's evaluation was interrupted because they moved to it), so they remove the enormous cloud margins.


This is a fascinating discussion -- I'm curious what people thing is the next step with AI? The post and several commenters here talk about how the tech in GPT-3 is "dumb" in that it's a big network but the network architecture is a fairly standard approach.

I'm curious what people think are the next stages of AI research that companies are working on... Is it Probabilistic Graphical Models? Is it Probabilistic Programming? Is it knowledge graph extraction from text? Is it something else? Curious what people think...


I think its about automatically building accurate and well-factored world models online that ultimately integrate not only high-dimensional sense data (such as visual information) but also language. This involves effectively solving the symbol grounding problem among other things. There is some serious effort in this direction in deep learning.

There are also other efforts using different types of probabilistic programming as well as symbolic and neural net combinations.

There's another link on one of the first few HN pages right now about dreaming. I think that dreaming gives one a lucid demonstration of some of the capabilities that we need to emulate if we are going to have human-like intelligence. AI will need to be able to visualize new situations, basically like on-demand, flexible simulations of mashed-up possibilities, involving things like physics and psychology etc.

I think we almost need the AI to have something like a 3d gaming engine with physics, but also it can effortlessly conjure up AI agents in this simulation, but also, many of the physics rules and behaviors of the AI agents are automatically learned with only a few examples. This is the type of capability that allows humans (and some other animals) to adjust so readily to new situations.

I speculate that there may be some representation or type of computation that has not been invented yet which facilitates both the simulation-type data and also the abstractions over it, all the way up to language, in a more seamless way than has so far been described. I saw a paper talking about the symbol grounding problem in terms of everything being categories, but really in the end it was broken down into something kind of like Lisp + probabilistic programming, and it seemed to not really have sufficient granularity to really do justice or properly integrate sense data. Certainly not in a seamless or truly unified way in my opinion. Although I guess I don't really understand category theory.


> GPT-3 is the first AI system that has obvious, immediate, transformative economic value.

Seriously? No other piece of machine learning has had economic value? How short sighted.


> GPT-3 is the first AI system that has obvious, immediate, transformative economic value.

This statement is unbelievably ignorant of history. Just picking one random example out of a hat: planning and scheduling systems have had a profound impact on the manufacturing and shipping industries for many decades now.


If I had the money and the dataset for training a model like GPT (that's a big if), is the code to implement such a thing trivial? Or is it a valuable proprietary asset of OpenAI as well as their instance per se?


You would have to also reproduce their code which is the largest cost in any software development project .

The code is non trivial but if you wait someone reimplements it .

The dataset is also nontrivial because they probably cleaned the data which.

It’s a valuable asset but it’s not like someone couldn’t reproduce it.


I fear Betteridge's law of headlines applies here.

A CS lecturer of mine told us that when he was a student he had a lecturer who advised him to be sceptical of AI revolutions. That was nearly 20 years ago. I've no doubt we'll see further steps but I'm not going to hold my breath for something transformative.


Yes, we should be skeptical of AI revolutions, but that also doesn't mean they are impossible or we should never devote some thought to updating our evaluations of the current risks.


I’d like to go on the record as being a GPT-3 skeptic. Yes, it’s a massive improvement over markov models, and yes, it will be used for propaganda. But the AI effect is very strong, and in a year or two people will be used to it and you’ll see more writing to the effect “why GPT-3 wasn’t such a big deal after all”.

Personally, my guess is that it’s actually just plagiarizing the training set in a way that most researchers will come to view as a kind of cheating. What I mean by that is, if you take some plagiarism detection software and run it on GPT-3’s output, it will ring like crazy.

I say this both because I believe it and because if it’s not the case, if we really have a proto-AGI on our hands, then being wrong won’t matter. I sincerely hope that we are a thousand years away from that, because otherwise we are plainly doomed.


What I mean by that is, if you take some plagiarism detection software and run it on GPT-3’s output, it will ring like crazy.

Somebody should try this. I ran a few paragraphs from AIDungeon through https://plagiarismdetector.net/ and got zero or low plagiarism percentages, but I'd imagine there are much better detectors that aren't publicly available.


Strange logic.

We're doomed regardless. We don't have a thousand years. Maybe not even 100.


Are you referring to environmental collapse? I agree but I'd like to try a bit harder before calling it. :)


Indeed, see Gary Marcus' critique from last year: https://thegradient.pub/gpt2-and-the-nature-of-intelligence/


It's worth chasing that with gwern's critique of Marcus' critque: https://www.gwern.net/GPT-3#marcus-2020

(the critique is: GPT-3 can in fact do all the things Marcus said it couldn't)


I can't play with GPT-3 but when I play with GPT-2 I can easily trick it with counting games. It does well with 0,1,2,3,.... but things like 0,1,3,6,10, get poor responses. Is GPT-3 good at that?


yes. I have tried it with gpt-3 Q: what comes next in the series: 0,3,6, A: 9

Q: what comes next in the series: 0,3,6,9, A: 12


I think the question of "how much can it reason" can be made more specific as "how far away can it reason from the learned examples".

Increases in reasoning power should allow for much smaller usable models.


Yes - reproducing fragments from various texts can look impressive, and could be useful in some applications - like creating comments on HN! (I give it a week before someone says "GPT3 has commented on HN and earned 500 Karma!!!"). But I don't think it can be a reliable problem solver or co-creator.

The fun bit is generalization. Create a pattern that hasn't been read before. Hard with GTP-3 because it's been given everything to read...


I think that this is still reproduction. Try things like 1,A,3,C,5,E,7 or a,1,aa,2,aaa,3,aaaa


Do you feel like the lecturer was correct, or incorrect?


Question:

Is a collapse in learning time a possible breakthrough for future, or do we have definitive ~information theoretic bounds for says number of dimensions, etc.


>> GPT-3 is the first AI system that has obvious, immediate, transformative economic value.

To say the least, it is not immediately clear where that "transformative economic value" lies.

From what I've seen so far GPT-3 can generate structurally smooth but completely incoherent text and despite claims to the contrary cannot perform anything close to "reasoning" [1]. It can also perform some side-tasks like machine translation and question answering, though with nowhere near good enough accuracy for it to be used as a commercial solution for these tasks.

All this is not very useful or even interesting. Text generation is a fun passtime but unless one can control the generation to very precise specifications, to generate good quality text that makes sense on a particular subject, text generation is nothing but a toy with no commercial value (and even its scientific value is not very clear). And GPT-3's generation cannot be controlled to such precise specifications.

We've had AI software that could interact intelligently with a user since the 1970's, with Terry Winograd's SHRDLU [2] and that never led to "immediate, transformative economic value", even though it was every bit the sci-fi-like AI program that could be directed by natural language to perform specific tasks with competence, albeit in a restricted enviroment (a "blocks world"). GPT-3 is not even capable of doing anything like that (nor are any other modern systems). How is a language model that is likely to respond with "blue offerings to the green god of mad square frogs" to a request to "place the blue pyramid on the red sphere" bring "transfomative" value?

In fact, we've had systems capable of generating much more coherent (and still grammatically corret) text for some time [3] and even those have not caused a dramatic upheaval of "transformative economic value".

I'm sorry but I'm afraid that, with GPT-3, we're again in a spiralling peak of hype, just as we were a few years ago with all the claims about sef driving cars "next year" etc. I think we all know how those panned out.

In any case, you don't have to take my word for it. As with self-driving cars, all we have to do is wait a few years. Say, until 2024. We'll have a good idea about GPT-3's "transformative value" by then.

__________________

[1] Unless of course one insists on Procrusteanising the definition of "reasoning" sufficently to cover essentially random guessing.

[2] https://en.wikipedia.org/wiki/SHRDLU

[3] I'll need to dig up some references if you ask, but in the meantime search for "story generation".


Any one know how to test this yet?


I'm very heavily inclined to believe that yes we are.

And my money is still on DeepMind.


Can you link to the critical paper you are referencing?


Perhaps I missed it but what are some useful applications of GTP-3?


For it "as is" - i.e. without imagining any new things, I would say you could make money from AI Dungeon, chat bot, and selling API access.

I know that I badly want to play with the AI and would pay some amount per month to get some number of queries.


AI Dungeon has a monthly fee now to upgrade to GPT-3 and a new much better engine. It works pretty well.


Promoting your SoundCloud account at the end of a Twitter thread.


You know what? There are two people I'd really would like to interview about GTP-3 (and what an hypothetical GTP-4 or 5 could achieve).

One is Hofstadter. The other is Ted Chang.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: