There's a reason why you can't do this commercially and why Google isn't doing i...

moralestapia · on Dec 8, 2022

You're confused, copyright != antitrust violations.

Both sources you provide have zero mentions of the word 'copyright' in them.

Those lawsuits have to do with Google's dominating the search market and using that to their advantage in ways that are allegedly unfair.

Copyright law actually allows a service like Google to exist in the first place.

drusepth · on Dec 8, 2022

According to OP's model, for anyone wondering:

>Copyright infringement and antitrust violations are two distinct types of improper use. Plagiarism is an ethical violation that occurs when someone attempts to pass off someone else's work or ideas as their own, without properly giving credit to the original source. It is not against the law, but can have serious consequences such as failing grades, termination, and difficulty finding new employment. On the other hand, copyright infringement occurs when a party takes an action that implicates one or more of the rights listed above without authorization from the copyright owner or an applicable exception or limitation in the copyright law, such as fair use. The most common antitrust violations fall into two categories: agreements to restrain competition and efforts to acquire a monopoly.

isthispermanent · on Dec 9, 2022

Yes, thank you. I’m neither confused nor disagree with you. I simply cited the most recent, easily found examples where Google ran afoul for rich results.

There are plenty of examples out there, as mentioned the ones prior to the recent shopping ones. Feel free to dig.

stevewatson301 · on Dec 9, 2022

The possibility that a LLM could trigger a copyright violation strengthens the narrative that Google is harming smaller business, and thus can easily be used as a data point in an antitrust lawsuit.

isthispermanent · on Dec 8, 2022

Thinking more on this... I don't think any of these sites will live if they get big enough. And if enough of them pop up it'll draw tons of attention from content sites.

If you want to show that data you'll end up having to work out a license from StackOverflow. Possible, but far more difficult than the current ease of plug-and-play GPT drop-in.

Do we really think Google hasn't thought of this exact thing already?

stevewatson301 · on Dec 9, 2022

Google is already working on LaMDA and Imagen for conversational search experiences, which is why these projects also wax poetic about "AI safety" -- you don't want to synthesize a politically incorrect or socially unacceptable response to a question asked.

Apart from the copyright issues that parent mentions, there's also the issue of LLM spewing BS confidently, which is why Google has been hesitant to roll it out as their default.

isthispermanent · on Dec 9, 2022

Agreed.

This post sums up the other issues outside of copyright that these types of services are certain to run into…

1. DeepMind 2. Infrastructure 3. Trust 4. Freshness 5. Habit Breaking

https://www.maxinomics.com/blog/fade-the-chatgpt-hype

jchw · on Dec 8, 2022

Stack Overflow user content is licensed under the Creative Commons license, so it's possible you actually could satisfy the license terms. That said, IANAL, and I have no idea if it's possible to fulfill the SA clause without distributing the model, or something like that.

freediver · on Dec 8, 2022

The reason is likely simpler:

- It is expensive (~0.5c per generated answer)

- It is (currently) slow (2-3 seconds to result)

- It is hard to place ads inside direct answers (probably the most important)

dwaltrip · on Dec 9, 2022

Those problems don't seem insurmountable, especially if it is 10-100x better than Google.

iso1631 · on Dec 8, 2022

If it's a good result I'm sure there are many people that would pay 1c per search. I've made 16 searches today, far less for stuff I didn't find with ddg. If I was after something specific I could charge my account with $5 and search away.

freediver · on Dec 8, 2022

I agree, but that is not how Google search (currently) operates.

iso1631 · on Dec 9, 2022

Great opportunity for someone to disrupt

Costs are only going to come down.

lossolo · on Dec 8, 2022

What's funny is that most of this ground breaking LLMs you see now are based on Google published research about transformers, and they have better performing models in house than anything publicly available on the market.

notpushkin · on Dec 9, 2022

Note that pulling the meat of the content from StackOverflow isn't copyright violation though, as long as you follow the license (which is Creative Commons something-something but probably fine for this particular application).

worldsavior · on Dec 8, 2022

But it's siting the sources, how is it a copyright violation?

fabianhjr · on Dec 8, 2022

Citing does not confer a license

moralestapia · on Dec 8, 2022

You don't need a license to cite others.

shawnz · on Dec 8, 2022

But what about when you're also reproducing the content on your own page like what's being done here?

moralestapia · on Dec 8, 2022

It's tricky but you don't need a license for that either.

With tricky I mean that only under very specific circumstances you would be infringing copyright laws, like maybe if the content was private in the first place; but then in that case you wouldn't be infringing copyright either, you would just be breaking privacy laws/terms.

I honestly can't think of an example where you would get in trouble by citing a piece of content that belongs to someone else, but I'm not closed to the possibility that it could happen.

thewataccount · on Dec 8, 2022

> only under very specific circumstances you would be infringing copyright laws

An extreme example, you cannot just upload a complete movie and just add "credit to disney".

> you wouldn't be infringing copyright either, you would just be breaking privacy laws/terms.

Maybe we're from different countries, but with US law it would be either under theft, the computer fraud/abuse act and/or copyright violations, there are no privacy laws applicable here unless we're talking about PII.

Extracting very specific examples from an article or blog is almost certainly going to fall under fair use. However I've seen several cases where it essentially just returns an entire article as the answer which would certainly be legally ambiguous.

crystaln · on Dec 8, 2022

Look up the fair use doctrine. You can reproduce parts of content.

fabianhjr · on Dec 9, 2022

One of the four factors is market impact which in this case would likely fail.

In the words of ChatGPT:

> When determining the potential impact on the market for the original work, courts will consider whether the use of the copyrighted material is likely to harm the market for the original work. This may include whether the use of the copyrighted material would compete with the original work, such as if it is used as a substitute for the original work or if it would reduce the demand for the original work.

As such this is at least not clearly a fair use case. (And arguably quite possibly a not fair use case)

moralestapia · on Dec 8, 2022

It's not, I disagree with GP's argument. Safe harbors in copyright law exist to allow this.

isthispermanent · on Dec 8, 2022

It's quite likely a fair use violation...

1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; 2. the nature of the copyrighted work; 3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 3. the effect of the use upon the potential market for or value of the copyrighted work.

Particularly 3 & 4.

https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors

gnopgnip · on Dec 8, 2022

Safe harbor covers hosts of user uploaded content. The copyright owner can pursue the infringer, they are not protected by safe harbor.

worldsavior · on Dec 8, 2022

Tho now I think about it more, it might be damaging the site money of income (ads, etc). But it's still not a copyright violation.

isthispermanent · on Dec 8, 2022

"it might be damaging the site money of income"

Which is one of the key factors of determining Fair Use and Fair Use falls under copyright.

moralestapia · on Dec 8, 2022

It could also be bringing new money to those sites by referring users to them.

So, ¯\_(ツ)_/¯.

The issue with Google had more to do with antitrust behavior than with copyright infringement.

shawnz · on Dec 8, 2022

I might be bringing new money to movie publishers by pirating their movies and sending clips to my friends, but that's not a valid basis for calling it fair use

MuffinFlavored · on Dec 9, 2022

> Pulling the meat of the content from a site like StackOverflow ends up as a copyright/anti-trust violation.

Then how did ChatGPT do it?

charlesju · on Dec 9, 2022

Non profit right?