Hacker News new | past | comments | ask | show | jobs | submit login
Commercial web scraping - is it stealing? (wsj.com)
18 points by hoop on Oct 13, 2010 | hide | past | favorite | 6 comments



Aren't sites able to prevent this type of thing through a prominent terms of use link on every page? (Ticketmaster 2003, Cairo v. CrossMedia Services)

Is it that this is still a legal gray area, or is it that big companies can roll over small companies and individuals?

Ticketmaster - http://itlaw.wikia.com/wiki/Ticketmaster_v._Tickets.com

Cairo v. CossMedia - http://itlaw.wikia.com/wiki/Cairo_v._CrossMedia_Services


In this case it was "big company" versus "small company who is selling the same data." The real issue seems to be that "small company who is selling the same data" feels that "big company" stole from them (instead, they should have bought the data). They did fight back legally, via a cease-and-desist which "big company" complied with, so they kind of won.

Personally, my major concern is an article on something as seemingly trivial as web scraping making its way into the Wall Street Journal.

As you point out, the legal protections are there, but from a technical standpoint how do you prevent that? DRM in HTML6 (</sarcasm>)? I'm concerned because websites that prevent me from right-clicking to "view source" or already annoying enough.


It's almost always going to violate the site's TOS, so if you're a business that depends on regularly scraping sites without permission, prepare to change your business model or be sued. (eg. Octopart vs Mouser and Digikey)


There's a lot to be concerned about here for anyone who provides a data mining backed web application or service.

At PatientsLikeMe patients are trading use of their information for free access to data analysis tools and social community.


So, when are we going to get a law making it illegal to violate robots.txt?


Good question. Probably a similar timeline between the first major news coverage of email spam and the CAN-SPAM act




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: