Woah. I never gave it my database schema but it assumes I have a table called "users" (which is accurate) and that there's a timestamp field called "signup_time" for when a user signed up.
I am definitely impressed by the fact that it could get this close without knowledge of the schema, and that you can provide additional context about the schema. Seems like there is a lot of potential for building a natural language query engine that is hooked up to a database. I suppose there is always a risk that a user could generate a dangerous query but that could be mitigated.
Not related to the article but what exactly is "open" about OpenAI?
Nothing. It was a not-for-profit but it converted itself to a for-profit entity and made an exclusive deal with Microsoft for GPT-3 (not sure how it's exclusive given all the beta API users).
Granted training your own copy of GPT-3 would be beyond most peoples means anyway (I think I read an estimate that it was a multi-million dollar effort to train a model that big).
I do think it's a bit dodgy to not change the name though when you change the core premise.
GPT-3 is the same "tech" as GPT-2, with more training. GPT-2 is FOSS. I have a feeling that OpenAI's next architecture (if there ever is one) would still also be FOSS.
I think OpenAI just chose a bad name for this for-profit initiative — "GPT-3" — that makes it sound like they were pivoting their company in a new direction with a new generation of tech.
Really, GPT-3 should have been called something more like "GPT-2 Pro Plus Enterprise SaaS Edition." (Let's say "GPT-2++" for short.) Then it would have been clear that:
1. "GPT-2++" is not a generational leap over "GPT-2";
2. an actual "GPT-3" would come later, and that it would be a new generation of tech; and
3. there would be a commercial "GPT-3++" to go along with "GPT-3", just like "GPT-2++" goes along with "GPT-2".
(I can see why they called it GPT-3, though. Calling it "GPT-2++" probably wouldn't have made for very good news copy.)
You make it sound as if GPT-3 is just the same GPT-2 model with some extra Enterprise-y features thrown in. They're completely different models, trained on different data, and vastly different sizes. GPT-2 had 1.5B parameters, and GPT-3 has 175B. It's two orders of magnitude larger.
Sure, both models are using the same structures (attention layers, mostly), so it's a quantitative change rather than a qualitative change. But there's still a hell of a big difference between the two.
Right, but GPT-2 was the name of the particular ML architecture they were studying the properties of; not the name of any specific model trained on that architecture.
There was a pre-trained GPT-2 model offered for download. The whole "interesting thing" they were publishing about, was that models trained under the GPT-2 ML architecture were uniquely-good at transfer learning, and so any pre-trained GPT-2 model of sufficient size, would be extremely useful as a "seed" for doing your own model training on top of.
They built one such model, but that model was not, itself, "GPT-2."
Keep in mind, the training data for that model is open; you can download it yourself and reproduce the offered base-model from it if you like. That's because GPT-2 (the architecture) was formal academic computer science: journal papers and all. The particular pre-trained model, and its input training data, were just published as experimental data.
It is under that lens, that I call GPT-3 "GPT-2++." It's a different model, but it's the same science. The model was never OpenAI's "product." The science itself was/is.
Certainly, the SaaS pre-trained model named "GPT-3" is qualitatively different than the downloadable pre-trained base-model people refer to as "GPT-2." But so are all the various trained models people have built by training GPT-2 the architecture with their own inputs. The whole class of things trained on that architecture are fundamentally all "GPT-2 models." And so "GPT-3" is just one such "GPT-2 model." Just a really big, surprisingly-useful one.
> Right, but GPT-2 was the name of the particular ML architecture they were studying the properties of; not the name of any specific model trained on that architecture.
That sounds like it would have been a reasonable choice for naming their research, but isn't the abbreviation "GPT" short for "Generative Pre-trained Transformer"? Seems like they very specifically refer to the pre-trained model, which I would also take from the GPT-2 paper's abstract: "Our largest model, GPT-2, is a 1.5B parameter Transformer[...]" [1]
What I meant by my last statement is that no news outlet would have wanted to talk about "the innovative power of GPT-2 Enterprise." That just sounds fake, honestly. Every SaaS company wants to talk about the "innovative power" of the extra doodads they tack onto their Enterprise plans of their open-core product; where usually nobody is paying for their SaaS because of those doodads, but rather just because they want the service, want the ops handled for them, and want enterprise support if it goes down.
But, by marketing it as a new version of the tech, "GPT-3", OpenAI gave journalists something they could actually report on without feeling like they're just shoving a PR release down people's throats. "The new generation of the tech can do all these amazing things; it's a leap forward!" is news. Even though, in this case, it's only a "quantity has a quality all its own" kind of "generational leap."
OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity.
Certainly makes that statement seem less credible.
What if, and bear with me, strong AI poses real dangers and open sourcing extremely powerful models to everyone (including malicious actors and dictatorial governments) would actually harm humanity more than it benefits it?
> (including malicious actors and dictatorial governments) would actually harm humanity more than it benefits it?
I'm really glad that weapons aren't open source. Imagine every dictatorship would get their hands on weapons. Luckily, it's hidden behind a paywall. /s
Incorrect. It is still a not-for-profit, which owns a for-profit entity. It is fairly common for charities to own much or all of for-profit entities (eg Hershey Chocolate, or in today's Matt Levine newsletter, I learned that a quarter of Kellogg's is still owned by the original Kellogg charity). And the exclusive deal was not for GPT-3, in the sense of any specific checkpoint, but for the underlying code.
- Hershey is a public company. Most certainly NOT owned by either a charity or a non-profit. The only way a non-profit comes into the picture is that a significant portion of their 'Class B' stock is owned by a trust which is dedicated to a non-profit (the Milton Hershey School). (https://www.thehersheycompany.com/content/dam/corporate-us/d... pp 36-37)
> To be tax-exempt under section 501(c)(3) of the Internal Revenue Code, an organization must be organized and operated exclusively for exempt purposes set forth in section 501(c)(3) [CHARITY], and none of its earnings may inure to any private shareholder or individual [NON-PROFIT].
You can be a charity (albeit not tax exempt) without being a non-profit, and moreover you can be a non-profit without being a charity. (See also https://www.irs.gov/charities-non-profits/other-nonprofits ; and keep in mind that still other types of non-profits are not tax-exempt at all!)
- Trust "owning" Hershey's: If you look at the document I cited, you'll note that the trust (which is still neither a charity nor a non-profit!) owns only 5.5% of Hershey's common stock.
I am definitely impressed by the fact that it could get this close without knowledge of the schema, and that you can provide additional context about the schema. Seems like there is a lot of potential for building a natural language query engine that is hooked up to a database. I suppose there is always a risk that a user could generate a dangerous query but that could be mitigated.
Not related to the article but what exactly is "open" about OpenAI?