Spam vs Mahalo: Matt Cutts Explains the Difference

melvinram · on Feb 22, 2010

Matt Cutts surely can't ignore this debate for much longer. The least he could do is chime in with Google's official perspective on the situation whether it's for or against the Mahalo approach.

shawndrost · on Feb 22, 2010

It looks like he's unable to comment at the moment: http://news.ycombinator.com/item?id=1143911

keltex · on Feb 22, 2010

That's not the real Matt Cutts

boundlessdreamz · on Feb 23, 2010

The real Matt Cutts has said he is aware of the mahalo problem. see this http://twitter.com/mattcutts/statuses/9513414423 and the thread.

NZ_Matt · on Feb 22, 2010

There is a large incentive for google to send traffic to Maholo. The low quality of the pages causes the user to want to exit the page. What do they click on when leaving? the adsense ads! I'm not saying Google are going out of their way to support this but it is easy to see why they may be hesitant to penalize Maholo.

pierrefar · on Feb 22, 2010

This is doing the rounds among SEOs on Twitter: http://www.mahalo.com/search?q=mattcutts%20is%20gay%20and%20...

simonw · on Feb 22, 2010

The "is gay" thing is lame.

mvandemar · on Feb 22, 2010

Yeah, lamer though that Mahalo turned it into a permanent page: http://www.mahalo.com/mattcutts-is-gay-and-wont-ban-us

pierrefar · on Feb 22, 2010

The whole situation is lame.

bhp · on Feb 22, 2010

All these Calacanis / SEO Book cat fights are getting annoying.

byrneseyeview · on Feb 22, 2010

They're also important. If Mahalo, Demand Media, etc. all continue to do what they're doing, then they'll have built a publisher-paid paywall around most of the content online.

That is bad news for anyone who tries to get long-tail organic search traffic. It's good news for sites with great brand names, but terrible news for anyone else.

qeorge · on Feb 22, 2010

Its not fair to lump Demand in with Mahalo. Demand's business model is filling in holes with their own original content, not scraping (or "aggregating") other people's content.

AFAIK, Google views their relationship as symbiotic, not parasitic.

byrneseyeview · on Feb 22, 2010

Demand Media does not produce high quality content. They product content that is good enough to rank (i.e. it's written in English) and unique. But they have a strong incentive to have bad content! If your article on "how to make pancakes" tells someone how to make pancakes, they close their tab and make pancakes; if it's 300 words of "original content" that makes no sense, you'll end up clicking through to another site (that has to pay for the privilege).

When you think of how many struggling freelancers use those long-tail guides to build their business ("How to shoot a commercial for a gym," or "How to write brochure copy for life insurance,"), you can see the magnitude of this problem. People who could trade their time for traffic now have to trade their money for traffic. When they're just getting started, money is harder to come by than time. The result: fewer people creating this kind of content, more of them joining organizations that pay for the traffic instead.

qeorge · on Feb 22, 2010

As I understand it, their algorithm looks for keywords with little to no competition, but at least ~$10/year potential in AdSense revenue. If they can find such a keyword, they pay someone ~$5 to write an article or make a video on the topic.

The result isn't Pulitzer worthy, but its a lot better than you're making it out to be.

Using your example, here's the wikiHow article on "how to make pancakes":

http://www.wikihow.com/Make-Pancakes

Its quite informative. How would you suggest they improve it?

greyman · on Feb 22, 2010

But, doesn't Demand only occupy one slot in search engines result pages for a certain keyword search? Even if Demand would create billions of articles, one for each conceivable search term, other nine spots in first 10 search results are still available for others, aren't they?

All in all, I still think this is primary a Google problem, not Demand's. They can publish anything they want, it's their right protected under free speech. It's Google which should be concerned about quality of their pages.

byrneseyeview · on Feb 22, 2010

The #1 search result gets about 40% of all organic clicks. And Demand Media has more than one property (eHow, Wikihow, Cracked, livestrong.com). So I wouldn't be surprised if there were some searches for which Demand Media got more than half of all traffic.

You are correct about this being Google's problem. These guys exist to exploit an arbitrage opportunity: Google's search algorithm picks them, and the average searcher's which-engine-do-I-choose algorithm picks Google. In the long run, one of these things will stop being true.

qeorge · on Feb 22, 2010

Cracked is not owned by Demand Media. It was a second-rate competitor to MAD as a print magazine, but has found tremendous success online.

Their model is similar to Demand's in that it is UGC + payment, but that's about it. The topics are not generated by an algorithm, for instance.

byrneseyeview · on Feb 22, 2010

http://www.demandmedia.com/brands/

I was surprised, too.

skinnymuch · on Feb 23, 2010

Wow. Can't believe they own Cracked. Learning this made me die a little inside.

_delirium · on Feb 23, 2010

There's something about their content feels fake, though. When I run across one of their pages in a search, it's not always terrible, but if I don't initially notice where I am, I sometimes get this weird feeling that the text I'm reading isn't quite right, and might not actually contain any information. It frequently has this pumping-out-words-to-fill-the-page feeling, very strongly reminiscent of high-school students' essays.

bhp · on Feb 22, 2010

The concept is important, but the personal attacks (link bait) is what I'm referring to, and is unnecessary.

detst · on Feb 22, 2010

Perhaps unnecessary but they certainly draw more attention and increase the likelihood that something will be done.

zaidf · on Feb 22, 2010

I thought Demand Media was all about original content?

byrneseyeview · on Feb 22, 2010

They are about "original content" in the Google sense. It's not quality content, just well-SEO'd stuff that is 1) good enough for Google to think it's grammatically-correct English, but 2) bad enough that contextual ads on the article will still get clicks.

patio11 · on Feb 23, 2010

Guys, draw a big circle around point #2 here and remember it: Demand Media understands the user psychology of CPC ads better than any Valley startup you've ever heard of. They want their content to suck because if it satisfies the users' needs then the users will not click the ads. If, on the other hand, it introduces the topic and leaves the user feeling "Where do I go next?", then the only options from a Demand Media page are either a) other worthless articles or b) CPC ads. And people choose door #2 quite frequently.

This kind of frustrates me to the extent that I compete with Demand Media (limited, but real), because there is absolutely no circumstance under which a Demand Media article is a better result for the user than a page on my site if we're in competition, but Demand Media has the economics of it nailed such that they can produce content for some genres at scales I can't possibly keep up without duplicating their core strategy (automated keyword selection -> cheap freelancer -> post without substantial quality control), pushing my related content out of the SERPs and then charging me for the clicks I would otherwise be getting for free.

byrneseyeview · on Feb 23, 2010

I got Point #2 from your blog, by the way. So thanks!

enjo · on Feb 23, 2010

Do you rank better for pages you do compete on head-to-head?

bobbyi · on Feb 22, 2010

They are important, but this one didn't really add any new information beyond the ones we've already seen.

vaksel · on Feb 22, 2010

I disagree.

Double standards need to be exposed.

If you do any of these things that Mahalo is doing, you'd get delisted from Google in no time.

brown9-2 · on Feb 22, 2010

I wish someone would summarize this fight for those of us who haven't been following it blow-by-blow.

plinkplonk · on Feb 22, 2010

"All these Calacanis / SEO Book cat fights are getting annoying"

the catfights landing on HN (and getting voted up) is doubly annoying!

ashu · on Feb 22, 2010

How can Google NOT look at that sample page Aaron has displayed? That is truly terrible :( Also, Matt Cutts is generally very honest and public, but in this case, uncharacteristically quiet.

prawn · on Feb 22, 2010

If you visit a page on Mahalo that you think doesn't cut it, be sure to hit the 'Ads by Google' link and send feedback to Google about it.

jasonlbaptiste · on Feb 22, 2010

some of these discussions make a lot of sense for HN as many entrepreneurs here are building businesses around content ie - "what is spam?". This one does not.

andrewpbrett · on Feb 22, 2010

"I talked to him, and so I said what software do you use to power your search engine? And he said we use Twika or MediaWiki. You know, wiki software, not C++ not Perl not Python. And at that point it really does move more into a content play. And so it is closer to an About.com than to a Powerset or a Microsoft or Yahoo! Search."

maurycy · on Feb 23, 2010

Depressing.

moultano · on Feb 22, 2010

If the title of an article is a question, the answer is no.

btipling · on Feb 22, 2010

How much original content is there in Google Reader? Not defending Mahalo, especially if it stealing copyrighted content from other sites, but not having original content doesn't mean you don't provide value. Google Reader has value and no original content.

byrneseyeview · on Feb 22, 2010

If Google started giving people public Google Reader pages, would you like it if Scoble's Google Reader outranked your blog when people searched for your blog's name?

There is value in duplicate content. But there's no value in a search query taking you to a search page, where the site you wanted to land on has to pay Mahalo for adding an extra click between you and what you wanted.

bobbyi · on Feb 22, 2010

All of google reader is blocked by their robots.txt, exactly like the article is saying mahalo needs to do for their scraped content http://www.google.com/robots.txt

melvinram · on Feb 22, 2010

see http://news.ycombinator.com/item?id=1143489

benatlas · on Feb 22, 2010

The internet is only moving in the direction where Google points and Google favors quantity over quality. All Internet problems flow from this Google unstated rule.

Google would rather make a cent each on a million stolen pages of crap than $10 on an original content, plus the spam "author" is not going to ask anything for the "content".

gcb · on Feb 23, 2010

manhalo and seo are the reason HN need a friend-foe system like slashdot.

Every time you post anything about it, you prove you don't deserve to exist. Or you are a spam bot.

melito · on Feb 22, 2010

FLAME BAIT IS FLAMING

melito · on Feb 22, 2010

While I can understand downing based on my knee jerk immaturity towards this, but is any of this legitimate?

Will anyone benefit from having read this article?

Are there really that many people interested in this?

Are you all really up voting this stuff because you're legitimately interested?

codexon · on Feb 23, 2010

Many startups here rely on cheap SEO for marketing.

If Mahalo (or anyone else with a high PR domain) can outrank everyone else simply by spamming, it's not fair to a nascent startup

CoryMathews · on Feb 22, 2010

pretty much yes.

jasonmcalacanis · on Feb 22, 2010

This whole thing is insane.... we have "stub" pages just like Wikipedia.

These are topic pages that people are working on and THEY DON'T RANK in search engines until they we get the word count to around 300-500 words.

We are the process of NOINDEXING the pages that are below 300 words just to make Aaron happy... we actually had these noindexed before our last version and that got lost in the shuffle of the new launch (really, it did... when you do new code you might leave something out of the old code).

i'm also getting a list of every page under 300 words and having the page managers build them out in 30 days or deleting them.

Anyway, i thank Aaron for busting out chops and making us better!

The claims that we are "scraping" are absurd... we're using google, bing, twitter, etc. apis to do a comprehensive search page.

i dont know everything about SEO, but i don't understand this claim by Aaron. i think he is trying to start trouble for us... and maybe it will work. Thanks pal!

aaronwall · on Feb 22, 2010

A couple clarifications if you don't mind then ;)

- If you don't want those pages indexed in Google then why are you submitting them in an XML sitemap?

- I have already shown examples of the 0 original content pages ranking, so how can you claim that they do not rank?

- You are not scraping directly, you are pulling from 3rd party sites and using it as content on your own site. Which is worse, because there is no way to opt out of it.

- My problem is not just with what you call stub pages, but with most of your pages. When you give people embed code to embed your content in their site you give them an iframe AND a direct link back to you. If you want me to stop highlighting the absurdity of it then perhaps you should hold yourself to the same standards as what you offer others. But you do just the opposite when you embed 3rd party content in your site. You slap a nofollow on the links and embed the content directly into the page (rather than in an iframe).

- Worth noting that every time I mention the above point you end up talking about stub pages or experiments or some other strategy to try to redirect attention. But in reality, what I am talking about is what you do on almost every page of your website.

jasonmcalacanis · on Feb 22, 2010

These are the pages that get traffic at Mahalo:

1. Breaking news: http://www.mahalo.com/nhra-fan-killed http://www.mahalo.com/andrew-koenig http://www.mahalo.com/bloom-box

2. How To articles http://www.mahalo.com/how-to-speak-french http://www.mahalo.com/how-to-play-guitar-for-newbies

3. Walkthrough articles with our videos! dozens of them here: http://www.mahalo.com/walkthrough http://www.mahalo.com/call-of-duty-modern-warfare-2-walkthro...

is there something wrong with this pages?

jasonmcalacanis · on Feb 22, 2010

1. everything in site is in the sitemap... it's not selective. it will be shortly.

2. they don't get traffic is my point... we look at any page that gets over 100 page views in a month and we build those pages out. so, even if you find a page that ranks it will not have traffic. if it has traffic it gets built out.

3. we are not scraping, we are using search APIs

4. i dont understand this issue of our widgets (which don't get used to be honest.. it's a failed program)

5. this is simply false... our traffic comes from how to articles, walkthroughs and Q&A. if you want to know what the top 10 pages are they are things like how to play guitar and call of duty walkthrough pages. those things are 3-5k words!

just lay off dude... go troll someone else.

aaronwall · on Feb 22, 2010

- 1. everything in site is in the sitemap... it's not selective. it will be shortly.

Ah, so now you admit it was intentional. But good on you for (eventually? hopefully?) fixing it.

- 2. they don't get traffic is my point... we look at any page that gets over 100 page views in a month and we build those pages out. so, even if you find a page that ranks it will not have traffic. if it has traffic it gets built out.

If a person has a quarter million pages that are getting 5 visits each that is still a lot of traffic. Especially when the page has 0 editorial costs.

- 3. we are not scraping, we are using search APIs

The end result is what people would typically call a "scrapper site". It is irrelevant how it is created (if you scrape directly or syndicate from somewhere else that is scraping). The issue is a lack of editorial control (see your page about 13 year old rape) and a lack of citing sources with links.

- 4. i dont understand this issue of our widgets (which don't get used to be honest.. it's a failed program)

Search engines have duplicate content filters. If the content is within the page as HTML (as you do on Mahalo) then you can often outrank the original source for their own content. You bypass this issue and me mentioning it if you only use an iframe to embed the content in your pages. But if you embed it directly into the HTML (as you are doing right now) then of course it is bogus.

- 5. this is simply false... our traffic comes from how to articles, walkthroughs and Q&A. if you want to know what the top 10 pages are they are things like how to play guitar and call of duty walkthrough pages. those things are 3-5k words!

I am not talking about your top 10 pages. I am talking about the bottom 300,000 pages, which in aggregate get far more traffic than the top 10 pages do. :D

- just lay off dude... go troll someone else.

Not trolling at all. Just trying to give you valuable feedback, as you have claimed it to be publicly multiple times (unless you were lying when you stated that) :D

jasonmcalacanis · on Feb 22, 2010

Anyway, we're deleting any short pages right now and noindexing any short pages.

this will all be done in the next 72 hours and then there will be nothing to complain or write about after that Aaron!

Thanks for making us better.

aaronwall · on Feb 22, 2010

Good first step!

Does that mean that (for the remaining pages on the site)...

a.) the other scraped content which exists on the remaining pages will be put in an iframe (rather than as text on the page)

- OR -

b.) that you will be removing nofollow from the pages you are scraping content from?

Either you trust the content enough that you should link to it directly, or you should put it in an iframe such that search engines don't see it. Either route would likely be more akin to fair use than what you are currently doing (automatically scraping 3rd party content into your pages and using it to rank against the content creators, without permission, and without a way of opting out).

btilly · on Feb 22, 2010

After this many lies, do you actually believe he is planning on actually taking that step? In my books he's already used up his credibility. I won't believe anything he says until he says what he has done and someone else publicly verifies it. (He has lied enough that I don't think it worth my time to bother verifying anything he says. The bozo bit is well and truly flipped.)

aaronwall · on Feb 22, 2010

Fair point. ;)

houseabsolute · on Feb 22, 2010

Passive aggressiveness looks good on you.

byrneseyeview · on Feb 22, 2010

What fraction of your traffic comes from those 3-5K-word articles? And what's your return on investment from those compared to, e.g., one of the pages that Aaron linked?

I'm curious! I'm in the content-creation business, and if what you're doing works, I'll either need to radically change what I do or to start copying you.

benatlas · on Feb 23, 2010

This is not a small matter, this destroys the internet for the rest of us. The Internet becomes unusable and untrustworthy. And you are doing this in a profitable collusion with Google. You cynically going where Google pushes you.

lmkg · on Feb 22, 2010

> The claims that we are "scraping" are absurd... we're using google, bing, twitter, etc. apis to do a comprehensive search page.

Some people would consider that scraping. How do you define "scraping" and how does this practice not fall under the definition?

sp332 · on Feb 22, 2010

I think the point is using APIs, not "screen-scraping" - but the end result is mostly the same.

ErrantX · on Feb 22, 2010

> We are the process of NOINDEXING the pages that are below 300 words just to make Aaron happy... we actually had these noindexed before our last version and that got lost in the shuffle of the new launch (really, it did... when you do new code you might leave something out of the old code).

It's been a few weeks: have we seen any evidence of this happening?

mvandemar · on Feb 22, 2010

Newp.

aaronwall · on Feb 22, 2010

Not only are they NOT noindexing them (which they said they would do a couple years ago and a couple weeks ago), but they are still submitting them to Google via an XML sitemap

Likeso http://tinyurl.com/ydyo3ud

cakeface · on Feb 22, 2010

I understand that many of us disagree with with what Jason is saying in his comments, but I don't think that we should be downvoting his comments below 1. He is at least making an attempt at reasoning argument for his actions, and taking the time to post them. It makes the discussion hard to follow when all the detracting comments are a barely visible grey.

mscarborough · on Feb 23, 2010

>We are the process of NOINDEXING the pages that are below 300 words just to make Aaron happy... we actually had these noindexed before our last version and that got lost in the shuffle of the new launch (really, it did... when you do new code you might leave something out of the old code). > >i'm also getting a list of every page under 300 words and having the page managers build them out in 30 days or deleting them.

It's not just Aaron Wall that sees something fishy.

But for those of us who are just tired of the whole drama, just change how it's done, or don't do it at all. Adding nofollow and not submitting auto-generated content in Mahalo's sitemap does not seem like a great amount of development work if you really want to change it.

zaidf · on Feb 22, 2010

Funny. By those standards applied strictly, google's own search result pages would be considered spam. Jason can argue that there is value in the SELECTION of content that is not explicitly visible but nonetheless significant.

Similarly, Hacker News homepage might be considered spam because it's mostly titles of articles. Until you take into account the value of the votes--which is value that these google guidelines don't address clearly.

houseabsolute · on Feb 22, 2010

And you'll notice that Google does not index its own search result pages, nor encourage anyone else to do so. Perhaps "spam" is too pejorative a word and "things that should not be in our search engine" would be a better description.

newman314 · on Feb 23, 2010

For what it's worth, Topsy is doing the same for tweets. I've seen Google show a Topsy search page more than once while searching for things recently.

For example, links such as http://topsy.com/s/toyota+grand+jury+subpoena are submitted into google to be indexed.

newman314 · on Feb 23, 2010

Backtype does the same too as far as I can tell so there are definitely quite a few companies out there with such activities.

This is a bummer as this just serves to increase the noise to the detriment of getting good results.

melvinram · on Feb 22, 2010

Well the debate really is how well should Google reward Mahalo for it's practices. Namely, how much ranking Google should (and other search engines) give to Mahalo on it's SERPs (search engine results pages)?

I don't think Google's search results or even Hacker News really ever shows up too often on SERPs except for their respective brands... so it doesn't really apply. They are destination websites not focused on search traffic.

bmelton · on Feb 22, 2010

"search result pages would be considered spam."

I don't think so. It looks like the policy is in place to keep you from having to perform multiple clicks to get where you want. In the common use case (I visit Google.com, type a search, get a result) I do not have to click on anything more than the search result to get where I am going. The difference is in a use case where my search result takes me to a search results page, which means I am only halfway (or less) to my target page after clicking on the Google search result. This doesn't mean, necessarily that all Mahalo content is bad, but that Google shouldn't be indexing Mahalo search results pages, and instead should link directly to the resultant content pages instead (scraped or not.)

Similarly, Hacker News articles should not be the result of a search done on the news title (as HN's link to the original article actually increases the article's score, right?), but should perform better if the matched term is found in the HN-specific content (e.g., the commentary we create on HN only.)

I'm sure that there are (hopefully) edge cases in which a search for the article title returns the HN link instead of to the original article, but that should be the exception to the rule. Of course, if HN were to have an absurdly high page rank and link to a relatively unknown (or new) blog, it would likely lead to a result on here, instead of the other way 'round.

skinnymuch · on Feb 23, 2010

Google doesn't index their search results.