> If web scraping is legal Source? That LinkedIn case did not resolve how you th...

bri3d · on April 9, 2023

My understanding is that the current web scraping situation is this:

* Web scraping is not a CFAA violation. (EF Travel v. Zefer, LinkedIn v. hiQ).

* Scraping in spite of clickthrough / click-in ToS "violation" on public websites does not constitute an enforceable breach of contract, chattel trespass (ie - incidental damage to a website due to access), or really mean anything at all. This is not as clear once a user account or log-in process is involved. (Intel v. Hamidi, Ticketmaster v. Tickets.com)

* Publishing or using scraped data may still violate copyright, just as if the data had been acquired through any means other than scraping. (AP v. Meltwater, Facebook v. Power.com)

So this boils down to two fundamental questions that will need to get answered regardless of "scraping" being involved: "is GPT output copyrightable" and "is training a model on copyrighted data a copyright infringement."

visarga · on April 9, 2023

Is training a model on second-hand data laundering copyright? Second-hand data is data generated from a model that has been trained on copyrighted content.

Let's say I train a diffusion model on ten million images generated by diffusion models that have seen copyrighted data. I make sure to remove near duplicates from my training set. My model will only learn the styles but not the exact composition of the original dataset. So it won't be able to replicate original work, because it has never seen any original work.

Is this a neat way of separating ideas from their expression? Copyright should only cover expression. This kind of information laundering follows the definition to the letter and only takes the part that is ok to take - the ideas, hiding the original expression.

seydor · on April 9, 2023

If openAI tries to legally claim against this, they will be reminded that their model is trained on tons of unlicensed , scraped without consent content. If their training is legal, then this one is legal too

sebzim4500 · on April 9, 2023

The judgement of the LinkedIn case was that if the scraping bots had 'clicked the button' to accept terms then they should be held to those terms.

mountainriver · on April 9, 2023

It’s legal but if you don’t consent to people doing it in your robots.txt you can sue them civilly