Well, it seems like it's got some cool tech based on providing access to everybody else's copyrighted content in novel ways, a bunch of promise to make a lot of money without knowing 100% what the plan is, messy politics at the top, and it's around the San Francisco area, so....
> providing access to everybody else's copyrighted content in novel ways
This is a cartoonish interpretation of what Generative AI does. You might be coming from a good place trying to defend the "little guy" who supposedly is getting ripped off by GenAI (they aren't), but in practice you are helping copyright trolling and big rusty corporations that live off the perpetual copyright scam.
How is calling out violations of the GPL license helping copyright trolling?
If someone publishes GPL licensed code on github, and OpenAI then modifies that code in some answer it provides to someone asking a coding question, then OpenAI is in violation of the GPL license if they don't also license their stuff under GPL.
Citation very much needed, in particular if you're claiming that OpenAI would need to GPL license the OpenAI code ("their stuff") in order to provide the answer (as opposed to the lesser question of whether the code in the text of that answer ought to be covered by GPL).
GPL stipulates that derivative works must be licensed under GPL.
Just because an entire industry has their entire future riding on courts eventually deciding that "Yeah, but copyright doesn't apply to AI", that doesn't make it so.
Indeed; I think there's a genuine question as to whether the output of ChatGPT is a derivative work under the law or under our feelings of what's moral.
I've read a lot of code in my career, much of it GPL. When I now write code, it's based in part on that previous code reading. I think most people agree that doesn't mean that all code that I write must be licensed under GPL. In other words, they agree it's not derivative. Even if I write a C strcpy implementation that's functionally identical to one in glibc, I think most of us agree it's not derivative, even if I've read glibc's implementation before. Or if someone asks me "how can I write strcpy in terms of memcpy and strlen?" and I answer them with code I write that's very close to glibc's implementation.
Is AI-written code fundamentally different from bio-intelligence-written code in that regard? (I think the answer is uncertain in terms of "how should it work?", not that it's clearly one way or the other.)
> providing access to everybody else's copyrighted content in novel ways
That's a very biased characterization that downplays a debate that people have right now as basically being already solved. It's simply not truthful, unless they started a side business in developing a torrent tracker or something.
I'm not sure whether you think I'm being unfair to Google or to OpenAI.
Everybody and their brother sued Google early on for a huge variety of their products. Google News got sued for showing headlines from news websites. Google Books got sued for copyright violations for, y'know, making a copy of everybody's books. Back in 2007, Google would've been in the middle of the the Viacom vs YouTube lawsuit. The whole idea of a search engine is fundamentally about taking all of the useful and mostly copyrighted content out there owned by others and profiting off of it by becoming the gateway to it.
OpenAI, similarly, works by taking all of the text and art and everything in the world, most of it owned by others, then copying it, collating it, and compressing it down into a model. Then they provide access to it in novel ways. I make no representation about whether it's legal or ethical. It's transformative, useful, novel, and really cool, but it's clearly taking other people's data, and then making it useful and accessible in a novel way.
A big difference in my view is that OpenAI doesn't meaningfully "provide access" to all the text and art and everything in the world; rather, they suck the marrow out of all the text and art and everything in the world and use it to sell their own replacement service for all the text and art and everything in the world. Google has done some questionable things to become a gateway to the world's work, but they're (mostly) still a gateway, not a copy.
Maybe we can agree they are still _mostly_ at gateway. But their ambition to answer your question through zero-clicks (i.e., show the answer right in their search results) does make them profit from direct copying. The copyright owner will not get any clicks, show any adds, or get any other kind of feedback to their work, in those cases.
Yeah, for sure, Google wants to show the answer on the results page, and for many search queries they're able to achieve this in one way or another -- either through the knowledge graph, the question-answering slop, or in the snippets of text pulled from the search results themselves. They do, at least, show links to sources in these cases, which is better than ChatGPT (or Bard, to the extent that it's used). But I agree that there's not a lot of short-term incentive for Google to cite sources in a prominent way, and there is a lot of incentive for them to develop features that replace the websites that made them valuable in the first place. There's always been an uneasy bargain between Google and webmasters, and there's always been a tension between what's best for the Google user, best for Google, and best for the Web. If there's a similar bargain with OpenAI, I don't see them approaching it with nearly as much respect: source attribution has not been prioritized in any meaningful way.
I think the characterization is unfair to generative AI, because the information that is retained in the model is a very lossy and rough generalization of the training data - that situation is a lot less direct than what it's like with search engines. In my mind, the lawsuits against search engines have a lot more ground to stand on. Saying that something "provides access to copyrighted content" almost implies that there's some mechanism that allows the user to wholesale download complete, unedited and exact copies of some copyrighted material - I could see that argument with a search engine, but not really with a text or image generator.
It's 100% dependent on copyrighted material at least, even though you can't access an exact copy of it without access to the material it wouldn't exist. It's a messy issue, I have no idea how it will be solved since it's a rather new development for humanity, in the past when humans would collect knowledge from different sources to use them in a novel way there would be at least some kind of attribution, recognition of the sources, either being cited or acknowledged in a preface. With GenAI there's nothing, and probably not even a way for GenAI to tell us where it got "inspired" from to generate something.
It's going to be a very messy landscape for copyrights and intellectual property in the next years.
> In Washington, DC, though, there seems to be a growing consensus that the tech giants need to cough up.
> Today, at a Senate hearing on AI’s impact on journalism, lawmakers from both sides of the aisle agreed that OpenAI and others should pay media outlets for using their work in AI projects. “It’s not only morally right,” said Richard Blumenthal, the Democrat who chairs the Judiciary Subcommittee on Privacy, Technology, and the Law that held the hearing. “It’s legally required.”
> Josh Hawley, a Republican working with Blumenthal on AI legislation, agreed. “It shouldn’t be that just because the biggest companies in the world want to gobble up your data, they should be able to do it,” he said.
To clarify, this was a sarcastic remark. I wasn't trying to actually imply that it was okay just because everyone else was doing it. I suppose that little /s really can do a lot of work.