Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How do mashups avoid copyright infringement lawsuits?
13 points by villageidiot on Dec 28, 2008 | hide | past | favorite | 18 comments
Services like Dapper, Yahoo Pipes and Feedity facilitate the creation of mashups that are essentially built off other sites' data by retrieving that data as a convenient RSS feed.

Why do we not see more intellectual property lawsuits as a result of these recent applications of screen scraping?

I was looking at the Wikipedia entry for web scraping:

http://en.wikipedia.org/wiki/Web_scraping

Under the "Legal Issues" section it talks about the tort issue called "Trespass To Chattels" which covers a wide variety of computer trespass crimes, including web scraping but from what I can tell there are very few recent lawsuits with respect to web scraping. The big one was Ticketmaster vs Tickets.com but that was back in 2000:

http://www.tomwbell.com/NetLaw/Ch07/Ticketmaster.html

Sites like popurls, alltop, google news, aggregate data from other news sites. According to copyright law, this should be permissable as long as the quote is limited and not a complete republication. These sites seem to follow this guideline.

But are there other recent sites that do not, who you would expect to fall afoul of the law? How do they avoid litigation - by asking the originating site for permission to republish? By paying the content owners for the privilege to republish? Or has the commonness of the Web2.0ish way of regurgitating content made this a moot issue by now - to the point where most original content owners have given up on trying to control the ownership of their content through litigation?




Mashups are not all one thing. Sometimes it's a lone hacker modifying their own data, so licenses are not an issue. Sometimes it's data that the authors intended to be shared. And sometimes people are just retards and they steal or misuse other peoples' stuff.

Flickr is all about sharing and it tries pretty hard to give the users tools to control access to data and to express their licensing intent. For example, as much as is possible, there is simply no trace of private data in searches that don't have proper authorization. For public data, it is not a given that you can republish or modify the work. So the atom feeds have links to the licenses for each photo or video, and the API has similar features when obtaining photo info.

Users are able to grant a mashup app access to their private data in a formalized way (and to revoke it later). Finally, Flickr is also able to revoke the rights of any particular app to download data or simply throttle them to a reasonable amount per day.

All sites that offer RSS or an API should do these sorts of things. (The oAuth standard is a formalization of some of the techniques that sites like Flickr use.)


How do you know these sites are not retrieving content using RSS feeds? In this case the "social contract" says that the content owner consent to syndication.

In reality any reproduction of content is a violation of copyright law- no if, ands, or buts about it. Even Google's act of displaying text from web pages in their search results is infringement but of course they are safe because they are so useful and the web would be nothing without search engines. But there is no implicit permission being given.

The same rule likely applies to mashups (although I do not know of any that have been taken to court.) Why litigate if an app helps users consume your content (there are exceptions of course. For example, if page views on your actual site decrease and you lose money as a result.)


your comment about "any reproduction of content is a violation of copyright law" is absolutely wrong and has been overruled in the courts many times (perfect 10 v. google, sheffield news v. sheffield times).

Google can display text from web pages because they attribute the text to the site and provide a link to the original source. There are limitations on use which is why google paid for their book search functionality.


What you have come up with is a fuzzy case, not clear cut. Even with fair use, the way search engines use content can easily be considered infringement.

I still stand by my statement and the only reason search engines are allowed to copy petabytes of COPYRIGHTED material is because they are so darn useful. I don't know of any other service that sidesteps intellectual property rights (whether through fair use or not) and makes a good amount of money that has been allowed to exist and thrive.


The question is: how much is too much? Seems like that's a question for the courts to decide.

The NYT is about to find out the answer to this question:

http://news.cnet.com/8301-1023_3-10128600-93.html


By being too small to bother with.


Or by adding value.

I believe Y! still unofficially ignore the scraping of our address books because it was adding value to our users even though the services that did it broke our Ts&Cs and used the username/password anti-pattern.

In the future I think it will be more likely that we'll clamp down on that stuff because there is OAuth to authenticate with.

However the main point is because it added value to our users we let the scrapers get away with it.


So you think that once a mashup passes a certain threshold of popularity they need to deal with this legal issue but not before then?


I guess it also depends on whether it generates traffic (revenue, awareness, whatever) for the sites from which it takes its content. Don't bite the hand that feeds you and all that.


lots of time it takes major funding or acquisition by another company to make them a target... (I.e. YouTube)


YouTube was a target before the acquisition, they just hadn't moved on it yet. The speed with which Google paid off all of the rights holders (at the same time they acquired them) would indicate that it was far from the first time they'd thought of that.


Most definitely. You can't squeeze blood from a stone. A suit by the RIAA calls major attention to the defendant, so it's not something they want to do until they feel they have to.


But given the possibility of suits like these down the line, why would a startup even risk it at the outset by using mashup data obtained through scraping, etc? Isn't it asking for trouble? Although geeks prefer the algorithmic approach, building business partnerships from the start and stating one's intention about sharing data seems to be the only viable starting point for a mashup startup (from a long-term view).


Lot of startup successes happened on the backs of copyright infringement. The goal is to just get traction and then deal with it. It works somewhat often, and it's actually fairly rare a startup is unable to license some sort of deal and survive.

But no, I probably wouldn't do it personally. There's a tradeoff there, in that on one hand you're increasing your odds of success (potentially) by building on the backs of services that you know people want. On the other hand, there's some chance that if you're successful there will be a lawsuit.


When you say: "it's actually fairly rare a startup is unable to license some sort of deal and survive", I assume you mean "it's actually fairly rare a startup is able to survive without licensing sort of deal" rather than: "it's actually fairly rare a startup is unable to survive by licensing some sort of deal"

Whoa, my head is spinning.

In other words, usually a startup will be able to make a deal with the content owners which will allow the startup to survive. But I agree it's a risky ballpark to play in. One potentially big advantage, though, is if you're an unknown, this can raise your profile. So, even if this venture is unprofitable because of the legal costs and eventual revenue sharing with the content owners, by working on something that gains traction, you can leverage this reputation and experience on a future venture, which may put you in a more advantageous position than someone starting out from zero.

Or the energy sucked out of you from the lawsuit might make you want to leave the field entirely.


"Or, do mashups in a moderate way so that you are only providing limited quotes of the originating site and providing attribution?"

That's the civil way.

"It is often easier to ask for forgiveness than to ask for permission." -- Grace Hopper


I am not a lawyer but isn't it just like comedians cant be sued for copying something and giving it a spin? These are service sites, not content sites, so they are preforming a task upon the content and thus it's okay to do what ever. right?


Leaving aside the obvious ethical issue here, what's the solution? Don't do mashups? Or, do mashups in a moderate way so that you are only providing limited quotes of the originating site and providing attribution? Or, say to hell with it, I'll do what I want and only worry about legal problems if I get big enough that someone comes after me?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: