Commercial web scraping - is it stealing?

nopal · on Oct 13, 2010

Aren't sites able to prevent this type of thing through a prominent terms of use link on every page? (Ticketmaster 2003, Cairo v. CrossMedia Services)

Is it that this is still a legal gray area, or is it that big companies can roll over small companies and individuals?

Ticketmaster - http://itlaw.wikia.com/wiki/Ticketmaster_v._Tickets.com

Cairo v. CossMedia - http://itlaw.wikia.com/wiki/Cairo_v._CrossMedia_Services

hoop · on Oct 13, 2010

In this case it was "big company" versus "small company who is selling the same data." The real issue seems to be that "small company who is selling the same data" feels that "big company" stole from them (instead, they should have bought the data). They did fight back legally, via a cease-and-desist which "big company" complied with, so they kind of won.

Personally, my major concern is an article on something as seemingly trivial as web scraping making its way into the Wall Street Journal.

As you point out, the legal protections are there, but from a technical standpoint how do you prevent that? DRM in HTML6 (</sarcasm>)? I'm concerned because websites that prevent me from right-clicking to "view source" or already annoying enough.

gamble · on Oct 13, 2010

It's almost always going to violate the site's TOS, so if you're a business that depends on regularly scraping sites without permission, prepare to change your business model or be sued. (eg. Octopart vs Mouser and Digikey)

wpeterson · on Oct 13, 2010

There's a lot to be concerned about here for anyone who provides a data mining backed web application or service.

At PatientsLikeMe patients are trading use of their information for free access to data analysis tools and social community.

AndrewDucker · on Oct 13, 2010

So, when are we going to get a law making it illegal to violate robots.txt?

hoop · on Oct 13, 2010

Good question. Probably a similar timeline between the first major news coverage of email spam and the CAN-SPAM act