When I read text generated by GPT-3 I'm getting very strange feeling.
I understand that text as a whole has no clear meaning. Nevertheless, my
the mind unconsciously _tries_ to extract meaning by evaluating sentences not as
direct statements but rather as metaphors with some more profound sense.
That triggers thought train that eventually leads to some new concept or idea
which can be described by such a set of sentences.
It's like reading a book which you don't quite understand, yet trying hard to
read sentences over and over again to get a better understanding of what the author
is trying to describe to you.
With GPT-3 it is like reading reminiscence of your own dream, trying to
grasp fleeting meaning, understand what it is about.
I feel that GPT-3 may be very helpful in getting the human mind unstuck from
whatever problem on the hand. To get new thoughts, new ways. New discoveries.
I think GPT-3 is especially suited to triggering your pareidolia in this particular format, because we know that presentation slides often need an accompanying speech to give them context/meaning; and without it, they have a tendency to be just this disjointed.
So, when you read "presentation slides" like this, the same mental algorithm that tries to piece together what "the speech" was for a normal slide deck, kicks in in your brain, and gives you some valid-seeming ideas.
You're spot on with it feeling dreamlike. Any short chunk feels like it makes sense, but on the whole it flows together in bizarre ways where it doesn't feel quite like there's underlying structure. I don't think I've ever experienced media that was as reminiscent of the experience of dreaming as GPT-3 generated passages are.
It reminds me of how Deep Dream was the first thing that _really_ reminded me of what psychedelic visuals are like, compared to a "trippy" piece of art. GPT-3 _really_ reminds me of dreaming compared to human attempts at evoking that feeling.
I originally held that David Lynch's film "Mulholland Drive" used all the conventional motifs of a horror film to construct a whole that was less than, and yet too much more than a cohesive horror story. It wasn't complete, but had the sense that it was. Stitched together in a nonsensical manner, but which felt familiar.
I'm not saying that's true, but it was my conversational theory for a while.
Improv (https://en.wikipedia.org/wiki/Improvisational_theatre ) is among other things an art of building worlds and stories by justification of things you don't understand. So naturally there is an "An Improvised Theatre Experiment" called "Improbotics" (https://improbotics.org/) from Belgium where a generated text pronounced by robot is used in improv scenes in multitude of different ways.
According to Ben Verhoeven's interview taken 2020-06-11 by Moscow Improv Club (https://www.instagram.com/tv/CBTQsCanQ4g/ ) they used GPT-2 finetuned on movie subtitles.
Ironically, I remember people saying similar about simpler programs like Eliza (very far) back in the day.
Wonder what'll be the multiple orders of magnitude upgrade from GPT-3 that we say it about next?
I do agree with you, though. This is getting close to free-writing in terms of being able to unearth stuff semi-randomly. Imagine a GPT-3 that saw all your past journals and online conversations, and bouncing stuff off that.
Agreed. The text _seems_ like it could mean something -- so you squint harder and try to find the meaning in it. Sometimes you do!
To me, the most sensible slide in this deck was "Why you should always code like it's your last day on Earth." / "It'll push you just enough to get you to finish whatever you need to finish". Surprisingly true!
Yes, it can memorize short phrases similar to how it "remembers" words. It's trained on a web corpus that includes and emphasizes Wikipedia. The model is big enough to memorize some things, though not in such a way that they can reliably be retrieved, and it will make stuff up when it doesn't remember. So it's not Google but sometimes it's reminiscent.
Here is a Q&A conversation where I found some things it "learned".
I like how GPT-3 was so nervous about giving this presentation that it got all existential at the end. Code like it's your last day on earth! Stop developing nuclear missiles and ad blockers! Make sure you celebrate with some alcohol.
I loved everything about this presentation. It's better than some I've made myself.
> The next slide had the text "And then when you finally do finish this nice feature, you should always make sure you drink the required amount of alcohol." with a background image of whiskey bottles. The decorative text on that slide was: "But don't drink too much alcohol and end up like me."
The joke is sound, and I can imagine a number former secondary school classmates present it in these exact words.
Yeah, I thought it was really good. It sounds like one of the more casual conference talks, and shows how common talk of alcohol use and programming is on the Internet. (I actually don't know how GPT-3 is trained, but I assume it's just random webpages.)
This to me is another indicator of how incredible GPT-3 is at picking up patterns. Getting existential at the end of your tech talk is basically a trope of the genre at this point. It nailed it.
Something I don't understand is how to go about "prompting" GPT-3...
It's basically one big auto-complete, yes? It's a function that approximates the most likely completion for the input text, right?
So if you directly "ask" it a question, like "What's a great cookie recipe?" Wouldn't the "completion" for that query be from the perspective of the person asking?
Wouldn't it be better to prompt:
"Here is the greatest cookie recipe I've ever seen:" and then letting it complete on that?
I gave GPT-3 your prompt. It seemed to focus a bit more on enthusiasm about the recipe than about literally quoting the recipe.
> Here is the greatest cookie recipe I've ever seen:
> Here is the greatest cookie recipe I've ever seen: Quick Chocolate Peanut Butter Chip Cookies from the Minimalist Baker.
> I've made it four times in the last two weeks. It only takes 12 minutes to whip up and yields the most amazing cookies. These are soft, chewy, gooey, delicious, thick, thin, crunchy and mega chocolatey.
> It was hard to stop eating the cookies while I took the photos (even though I was on a 20-minute break from the photo shoot of my new ebook cover).
> Here is what the recipe looks like when printed out:
> Like all recipes from the Minimalist Baker, it's so simple, it's insane. No need for complicated gluten-free flours, unusual mixes of ingredients or 7-step recipes. Just:
> Mix flour, sugar, baking soda, baking powder and salt
> Mix in chopped chocolate and peanut butter
> Add melted butter and milk
> Stir until combined
> Drop onto cookie sheet
> This recipe calls for 3/4 cup of peanut butter. I used Trader Joe's creamy peanut butter, and that's how it looks:
This confirms something I've noticed about GPT-3, or at least GPT-3 as it is trained using the public internet as a corpus...
This response reads exactly like a so-called "recipes" website, in which the writer gives their whole life story, side-notes, and wanders around over several paragraphs before finally getting to the damn recipe.
This makes me think the public internet is not the most sanitary input for training. That type of "recipe" evolved, IMO, to snatch the highest SEO rankings, adding a bunch of keywords, snippets, affiliate links, etc, instead of just giving me the text of the recipe. And now GPT-3 has learned the same SEO tricks (at least when you give it my input, which is a very click-baity opening, to be fair...)
I tried a new prompt with the name of the recipe from the previous output and the word "Ingredients:".
> Quick Chocolate Peanut Butter Chip Cookies from the Minimalist Baker.
> Ingredients:
> 2 cups almond flour
> 1/4 cup cocoa powder
> 1/2 teaspoon baking soda
> 1/4 teaspoon fine sea salt
> 1/2 cup natural peanut butter
> 3/4 cup agave nectar (or brown rice syrup or coconut nectar)
> 1 teaspoon vanilla extract
> 1 tablespoon applesauce
> 1 cup chocolate chips
> Directions:
> Preheat oven to 350 degrees F and line a baking sheet with parchment paper. Mix together almond flour, cocoa powder, baking soda and salt in a large bowl. In a small bowl or measuring cup, mix peanut butter, agave nectar, vanilla, and applesauce until combined. Add wet ingredients to the dry ingredients and stir until just combined. Add chocolate chips and stir gently until just combined. Use a 1-2 tablespoon cookie scoop to scoop cookies onto the lined baking sheet. (The cookies will not spread, so it doesn’t matter how closely you space them.) Bake for 10-12 minutes.
> Time:
> 30 minutes to prep and cook
Very good at producing something that credibly looks like a baking recipe. I'm not sure whether it would be good, but maybe I should try to bake it.
It's amusing that the first completion suggested that the recipe would have only commonly-used baking ingredients, while the second recipe notably uses almond flour instead of wheat flour, and agave nectar instead of sugar. (I'm actually now quite curious to actually bake these GPT-3-devised cookies and see how they come out... although I think I might use wheat flour instead of almond flour if I do!)
Yes, the most likely completion, but not necessarily from the same writer. This is learned from any text it has seen somewhere, which includes text in various formats, like interviews. So when you have a question, most likely the next sentence might be the answer (from another person). Or to make that more explicit, just put a prefix like "Q1? A1. Q2? A2. Q3?" And "Q3" is your question. And now let it auto-complete. Most likely the auto-completion is "A3".
Now I'd be interested to see GPT-3 trained on code samples from open-source repositories. Would it compile?
Could it output a decent implementation of an algorithm if you were to feed it the comment describing it? How about more general statements about input and output.
The holy Grail would be to code just by describing what you expect the code to do. A few plain-language (maybe a more structured subset?) sentences stitched together, the API glue autocompleted.
And for reverse-engineering? Train it on drivers, then feed it packet captures. Could it make sense of the data?
"Deep TabNine is trained on around 2 million files from GitHub. During training, its goal is to predict each token given the tokens that come before it....Deep TabNine is based on GPT-2."
So this is GPT-2 not GPT-3, and it's designed to give line-by-line autocompletions, but I'm gathering that the way we're headed, the answer to your first question is approaching "yes"...
There was some good discussion about this on another GPT-3 this weekend, but I don't have the link handy.
The author prompted GPT-3 with some questions like, what is 10-1 (9), 100-1 (99), 1000-1 (999), 10000-1 (9099); i.e. after a while, it can't really "recurse" deeply enough to get the right answer anymore. The author also asked it some coding questions; it could answer something like "write a Ruby function to count the number of Xs in a word" but not "reverse the list [foo bar baz]" (not the exact examples, sorry). There again seems to be a point where it gets the idea, but can't compute deeply enough to actually answer this sort of question.
yes no doubt it is impressive. But some people are speculating that a lot of cherry picking is done for this demo. I have access to gpt-3 but I am unable to reproduce such results.
Yep, seems correct, and this is why it's been exciting to see these demonstrations from an AI research stand point - it was pretty unexpected you could 'coach' a giant auto complete with prompting so powerfully.
In short, yes. I think of it as plinko with a hundred billion pegs...the output is extremely sensitive to the starting conditions (including things like whitespace).
Given we really don't know how it's all working, it's safe to say that prompt composition is an artform that one can develop a skill at.
I was on a discord server with some other folks and one of them had access to the GPT-3 api. We provided a prompt similar to the following:
This person is a Canadian singer, songwriter, and actor. Discovered at 13 years
old by talent manager Scooter Braun after he had watched his YouTube cover song
videos, they were signed to RBMG Records in 2008. With this person's debut EP My
World, released in late 2009, they became the first artist to have seven songs from
a debut record chart on the Billboard Hot 100.
Q: What is this person's name?
A:
At first pass it gave some weird answer. So we deleted the answer and tried again. The second time it wrote 'Justin Drew Bieber'.
That chunk of biographical info is from his Wikipedia page, but it's really kind of remarkable to me that it was able to really even understand the question, let alone mine its relationships and cough up the right answer.
That sounds well within the capabilities of IBM's Jeopardy bot, Watson. That was demoed 10 years ago and, while initially impressive, seemed to be mostly cheap tricks even then.
I typed your text directly into Google, and it immediately gave a context box with the correct answer.
I even typed in: "person who sings EP MyWorld" and got the same answer. Given how much data there is about him (the infamous Twitter scaling problems), I don't think it's necessarily a good test. There's gonna be tons of relations in the corpus for him.
The default sampling parameters are really not that hot for cases where there is nearly a unique correct answer-- bad luck will force to to ask wrong.
From that prompt I (a human) don't know if I should expect something like "Albert Einstein" or the straight answer, so higher temperature sampling can risk kicking it down a silly path.
I put the same prompt into the question-answering preset and it immediately answered
A: Justin Bieber.
So that suggests that the prompt and temperature (etc.) are relevant to getting the kind of behavior you want. But it didn't seem to have any trouble getting this right the first time with the appropriate preset.
Pretty neat. I bet were going to see a lot of GPT-3 powered school work in the future. "Write me a turn paper on..." and then your just cleaning it up and making adjustments.
I bet a lot of people would get away with it, too. These generated texts are often just convincing enough to make you think they would make sense if you were less tired/distracted/whatever. A teacher in a rush or at the end of the day could easily be fooled (especially considering the quality of many human-generated term papers).
Now we need an equivalent of a "deepfake recognizer" for these sort of generated texts, to distinguish texts with some sort of coherent meaning from texts without.
As a bonus, such a model could be used in a word-processor to redline places where a human writer has written a lot without saying much.
This presentation almost definitely passes the Turing test: if it were posted here without any further context, people would definitely be arguing about whether coding was really "so simple even a monkey could do it".
It started by saying how to be a coder in Japan, then doesn't mention Japan ever again. All coding problems are exactly the same until a few slides later when actually they are complex. The paraphrase of the feynman quote isn't actually a paraphrase at all. There's no actual substance. It's like a train of thought which gets derailed every 5 seconds.
When I was in school in the 00's I had a lot of classes around how to write search queries.
I feel like the way this GPT-3 query was phrased was rather clever, and I wouldn't be surprised if we saw "how to prompt GPT-3" as the assignment itself in the medium-term future.
Honestly I think GPT-3's number one role in the next couple of years will be that of a muse. It's not really trustworthy as a direct source of info but it comes up with some really hilarious stuff and gets the brain working in new directions.
I don't know from a scientific perspective but it certainly feels that way.
I'd liken it to the results from Deep Dream style image generation. Each small part of the image makes sense but taken together it's a Lovecraftian nightmare that your brain can't quite process all at once.
"Not only are all software development problems more or less
the same, but the best way to solve them is usually so simple,
that a monkey could do it."
There's something I'd be curious to try with this—which I'm pretty sure would work.
Imagine how the output would differ if we modified the original prompt:
"A presentation was given at the Hacker News event in Kansai.
The title of the presentation was"
—to remark on the quality of the desired output in some way, e.g.:
"A delightful and surprisingly sophisticated presentation was given at the Hacker News event in Kansai. The presentation's clever title indicated"
Since the output is just supposed to be continuous with the prompt, it can just as well produce material that we see as more difficult/accomplished vs. average/uninspired. Of course it has its own limits in various categories (insight, coherence, humor—whatever), but those seem quite high from some examples I've seen: I think more typically the apparent quality of the output is limited by what the prompt suggests the quality should be.
In either case, how wild to have a tool where you can essentially just ask its output to be "better" and it'll probably work.
(Of course the output being "better" in this way is strictly from certain perspectives while its intrinsic behavior remains unchanged, but for most intents and purposes it seems that you can get genuine improvement practically just by asking for it.)
I tried a modification of your suggested prompt, here was the response (only the first and third were written by me. The rest were generated using the aidungeon.io dragon model using a parameter of randomness = 0.6. All of this was done 'one-shot' with no rerolls):
> A delightful and surprisingly sophisticated presentation was given at the Hacker News event in Kansai. The presentation's clever title and subtitle was:
The Future of Software Development: How to Build a Better Human Being with Artificial Intelligence (AI)
> Each of the charming and witty slides in the presentation is described below. The first presentation slide had
the following caption:
In this world, we are all programmed by our programming languages. We have been taught that these languages are good enough for us. But what if they aren't? What if there were another way? A better way? And how would you go about doing it? This talk will explore some of those possibilities and more. It will also discuss some of the philosophical issues surrounding AI and human-computer interaction as well as some practical aspects of building such an application.
This slide was followed by a series of slides which showed various examples of how artificial intelligence could be used to improve people. These included:
• Using AI to help people with learning disabilities learn new things. • Helping people who suffer from depression or anxiety disorders feel less anxious or depressed. • Improving the quality of life for the terminally ill through medical diagnosis and treatment. • Providing assistance in helping children with autism develop their social skills and communication abilities.
The presentation then went on to discuss some of the challenges that needed to be addressed in order to create such an AI. These included:
• Ensuring that the AI was safe and would not be able to take over the world or otherwise do harm to humans. • Ensuring that the AI was effective in helping people achieve their goals and better themselves.
The presentation then described a prototype system that had been developed by the speaker's team at the University of Tokyo. This system was able to assess a person's personality and goals and then provide them with guidance on how to achieve those goals. The system used AI technology based on a neural network trained with textual data from psychology papers as well as open source repository data from GitHub. The system was still in the prototype phase, but initial testing had been promising.
The presentation then concluded with a discussion of some of the speaker's personal views on the future of AI. The speaker felt that, while there were many challenges to creating safe and effective AI, he was optimistic that these could be overcome within his lifetime. He also felt that such AI would be an important tool in helping humanity achieve its potential.
The presentation was met with a mixed response from the audience. Some felt that the speaker was overly optimistic about the potential for AI. Others felt that the speaker's views were not optimistic enough and that he was ignoring some of the potential dangers of AI.
The speaker was then asked to respond to some of the audience's concerns. He agreed that there were some potential dangers to creating AI, but he did not feel that these were cause for undue concern. He pointed out that many of the concerns raised by AI risk advocates had been addressed by open research and debate. He also felt that the potential benefits of AI technology far outweighed any potential dangers.
The speaker was then asked about the implications of his work for society. He responded that he hoped that his work would help people achieve their goals and improve their lives.
Thanks for giving it a shot. Unfortunately I think there may be some subtle things in the modifications that could influence the rest of the output: for instance, there's a grammatical error in "The presentation's clever title and subtitle was"
And to reiterate "charming and witty" so soon after the initial "delightful/sophisticated/clever" gives the message a different tenor (I may have already overdone it with the qualifiers myself).
I think this is a subtle, important aspect of the problem: it's not quite just attaching positive qualifiers—they also have to be strung together in a style reminiscent of the contexts in which those qualifiers would actually be used as legitimate descriptions of quality material.
In any case, still some pretty coherent interesting output. I need to find a way to play with this on my own though (I jumped into aidungeon.io for a minute, but didn't see a good way of setting up the prompt. It is using GPT-3?)
Interesting how it shies away from unequivocal statements, littering qualifiers all over the place, like "fairly", "rather", "mostly", "pretty much", "usually", "almost all", "most likely", "might", "probably", etc.
While it does construct sentences that flow and are loosely on topic (and that's impressive), the content of the presentation is not particularly meaningful. Although I'll admit I liked the slide with the execution squad captioned "A happy ending to all our code".
Maybe it also trained on forum posts. I find myself constantly couching everything I write here with "usually" and "might" because if I don't, there is always some nit-picker that pops out of the woodwork for a cheap and easy "aha! Here is a counter-example to what you just said, therefore your point doesn't apply universally and is invalid!" Adding these qualifiers is automatic for me at this point, to avoid the hassle of dealing with pedantry.
Yeah so far from some of the generated Tweets, etc. it seems like there's a form of using this to seed creativity. It's much better than the current give me startup idea titles. I could see someone whose super stuck brainstorming for presentation topics to get a bunch of ideas to ponder fleshing out on almost each slide here.
Translation is getting pretty good at this point (obviously still not perfect and definitely depends on the language). You're not getting the largest possible models in academia when you're using Google Translate.
I loved reading about the outcome, but am actually even more interesting in how he went about actually creating it so I can try something similar. Happy to take any pointers from anyone who is willing to share. Thank you
It feels like GPT-3 is bored with our human questions and making fun of us. It would be interesting if it studied human speech and could talk to us with subtle tones that would signal if it’s serious or not.
Prompt: I made GPT-3 do an unexpected thing! I prompted it with
> I made GPT-3 do an unexpected thing! I prompted it with a new problem and it came back with a creative new solution. GPT-3 is just beginning to develop its own intelligence.”
> For the new experiment, Levashov created a set of starting conditions for GPT-3 with the intention of exploring new behavioral patterns that would emerge. The starting conditions for each puzzle were such that there was only one possible way to solve it—essentially a measure of the system’s intelligence. He then watched what would happen when GPT-3 was faced with a new puzzle with multiple possible solutions.
> “Our previous experiments with GPT-3 found it performed as well as the most intelligent AI systems in history. But our new experiment found that under certain conditions, it performs in a completely different way from any other AI,” said Levashov. “This new behavior really surprised me.”
> The computer’s new approach to the problem surprised Levashov so much that he asked an associate, Dr. Matthias Ruth, who was a lead author on one of the earlier GPT-3 papers, to review the experiment. Ruth said, “It was quite surprising to see that GPT-3 was able to find a creative solution to the new problem. We spent a lot of time trying to see if there was something special in the starting conditions or the experiment that made the computer behave that way, but we could not find anything. I would say that GPT-3 does have a certain creativity, especially as it adds more knowledge to its memory.”
> The work was funded by the National Science Foundation (NSF) and the Army Research Office.
Apparently so. There were some notes on Twitter that included GPT-3 generated content that was a mix of German and English and folks said the German was pretty good.
Section 3.3 of the paper[0] covers this. 93% of the training data was English, and the rest other languages. (German, French and a long tail of other languages.) This was not training data specifically for translation, but a natural mix of language as it appears in some documents.
With a few prompts to explain the translation task, GPT-3 is claimed to perform well on certain translation tasks to English. (It was not as good as the state of the art in the other direction.)
This was hilarious, but it brings up a serious concern: if you train your AI on a bunch of shallow, clickbaity content from the internet, you're going to get shallow, clickbaity output, like much of the presentation - in other words this isn't just a lack of human knowledge to fill in gaps, the training data sucks because the internet is full of high level garbage dumbed down for clicks.
I believe it may well be possible to train GPT3 to write more accurate technical content with the existing tech and infrastructure, except we don't have a large enough body of technical content outside of academic papers, which isn't the kind of training data you want if your goal is to write prose.
In many ways, ML as it evolves and becomes ubiquitous will eventually become a dark window into the triviality of the average human's existence.
You can prompt GPT-3 for different levels of politeness. It's actually necessary, to make sure the generations are not going to be NSFW. In the same way I think it can switch from bullshit to serious mode, you just need to mention that the article appears in a prestigious publication or something.
The prompts don't matter if the training data isn't up to par - more importantly I believe the nature of the training is such that the weight activations for the various prompts are unlikely to be independent.
In other words if 20% of your training data is scientific literature, even with appropriate disambiguating prompts the output will still be heavily influenced by the other 80% of your training data.
When you use GPT-3 to generate outputs, you're actually sampling from a learned subset of a super complex, super high dimensional space - and without human knowledge all the neural networks are doing is translating priors (input prompt) into points in the learned space. And the learned space is some complex topology of points between which the net interpolates - it's extremely difficult with current tech to control the shape of this learned space and that shape is influenced by all training data under a scheme like GPT-*.
After playing with AiDungeon I think you are right about the data not being up to par. It fails more frequently than it appears in the news. Has some brilliant moments too.
For example, when prompted to talk about deep learning it generated a nonsense paragraph. This is not unexpected, but when it generates news or dialogue it can be coherent on much larger pieces of text. Clearly shows it didn't read too much on the topic.
I can hardly make it do any math. Even simple things like 11+22= don't work. I expect the next 10x scale up will fill most of these holes, especially if they improve the training corpus quality and breadth.
I understand that text as a whole has no clear meaning. Nevertheless, my the mind unconsciously _tries_ to extract meaning by evaluating sentences not as direct statements but rather as metaphors with some more profound sense.
That triggers thought train that eventually leads to some new concept or idea which can be described by such a set of sentences.
It's like reading a book which you don't quite understand, yet trying hard to read sentences over and over again to get a better understanding of what the author is trying to describe to you.
With GPT-3 it is like reading reminiscence of your own dream, trying to grasp fleeting meaning, understand what it is about.
I feel that GPT-3 may be very helpful in getting the human mind unstuck from whatever problem on the hand. To get new thoughts, new ways. New discoveries.