This whole thing is insane.... we have "stub" pages just like Wikipedia. These a...

aaronwall · on Feb 22, 2010

A couple clarifications if you don't mind then ;)

- If you don't want those pages indexed in Google then why are you submitting them in an XML sitemap?

- I have already shown examples of the 0 original content pages ranking, so how can you claim that they do not rank?

- You are not scraping directly, you are pulling from 3rd party sites and using it as content on your own site. Which is worse, because there is no way to opt out of it.

- My problem is not just with what you call stub pages, but with most of your pages. When you give people embed code to embed your content in their site you give them an iframe AND a direct link back to you. If you want me to stop highlighting the absurdity of it then perhaps you should hold yourself to the same standards as what you offer others. But you do just the opposite when you embed 3rd party content in your site. You slap a nofollow on the links and embed the content directly into the page (rather than in an iframe).

- Worth noting that every time I mention the above point you end up talking about stub pages or experiments or some other strategy to try to redirect attention. But in reality, what I am talking about is what you do on almost every page of your website.

jasonmcalacanis · on Feb 22, 2010

These are the pages that get traffic at Mahalo:

1. Breaking news: http://www.mahalo.com/nhra-fan-killed http://www.mahalo.com/andrew-koenig http://www.mahalo.com/bloom-box

2. How To articles http://www.mahalo.com/how-to-speak-french http://www.mahalo.com/how-to-play-guitar-for-newbies

3. Walkthrough articles with our videos! dozens of them here: http://www.mahalo.com/walkthrough http://www.mahalo.com/call-of-duty-modern-warfare-2-walkthro...

is there something wrong with this pages?

jasonmcalacanis · on Feb 22, 2010

1. everything in site is in the sitemap... it's not selective. it will be shortly.

2. they don't get traffic is my point... we look at any page that gets over 100 page views in a month and we build those pages out. so, even if you find a page that ranks it will not have traffic. if it has traffic it gets built out.

3. we are not scraping, we are using search APIs

4. i dont understand this issue of our widgets (which don't get used to be honest.. it's a failed program)

5. this is simply false... our traffic comes from how to articles, walkthroughs and Q&A. if you want to know what the top 10 pages are they are things like how to play guitar and call of duty walkthrough pages. those things are 3-5k words!

just lay off dude... go troll someone else.

aaronwall · on Feb 22, 2010

- 1. everything in site is in the sitemap... it's not selective. it will be shortly.

Ah, so now you admit it was intentional. But good on you for (eventually? hopefully?) fixing it.

- 2. they don't get traffic is my point... we look at any page that gets over 100 page views in a month and we build those pages out. so, even if you find a page that ranks it will not have traffic. if it has traffic it gets built out.

If a person has a quarter million pages that are getting 5 visits each that is still a lot of traffic. Especially when the page has 0 editorial costs.

- 3. we are not scraping, we are using search APIs

The end result is what people would typically call a "scrapper site". It is irrelevant how it is created (if you scrape directly or syndicate from somewhere else that is scraping). The issue is a lack of editorial control (see your page about 13 year old rape) and a lack of citing sources with links.

- 4. i dont understand this issue of our widgets (which don't get used to be honest.. it's a failed program)

Search engines have duplicate content filters. If the content is within the page as HTML (as you do on Mahalo) then you can often outrank the original source for their own content. You bypass this issue and me mentioning it if you only use an iframe to embed the content in your pages. But if you embed it directly into the HTML (as you are doing right now) then of course it is bogus.

- 5. this is simply false... our traffic comes from how to articles, walkthroughs and Q&A. if you want to know what the top 10 pages are they are things like how to play guitar and call of duty walkthrough pages. those things are 3-5k words!

I am not talking about your top 10 pages. I am talking about the bottom 300,000 pages, which in aggregate get far more traffic than the top 10 pages do. :D

- just lay off dude... go troll someone else.

Not trolling at all. Just trying to give you valuable feedback, as you have claimed it to be publicly multiple times (unless you were lying when you stated that) :D

jasonmcalacanis · on Feb 22, 2010

Anyway, we're deleting any short pages right now and noindexing any short pages.

this will all be done in the next 72 hours and then there will be nothing to complain or write about after that Aaron!

Thanks for making us better.

aaronwall · on Feb 22, 2010

Good first step!

Does that mean that (for the remaining pages on the site)...

a.) the other scraped content which exists on the remaining pages will be put in an iframe (rather than as text on the page)

- OR -

b.) that you will be removing nofollow from the pages you are scraping content from?

Either you trust the content enough that you should link to it directly, or you should put it in an iframe such that search engines don't see it. Either route would likely be more akin to fair use than what you are currently doing (automatically scraping 3rd party content into your pages and using it to rank against the content creators, without permission, and without a way of opting out).

btilly · on Feb 22, 2010

After this many lies, do you actually believe he is planning on actually taking that step? In my books he's already used up his credibility. I won't believe anything he says until he says what he has done and someone else publicly verifies it. (He has lied enough that I don't think it worth my time to bother verifying anything he says. The bozo bit is well and truly flipped.)

aaronwall · on Feb 22, 2010

Fair point. ;)

houseabsolute · on Feb 22, 2010

Passive aggressiveness looks good on you.

byrneseyeview · on Feb 22, 2010

What fraction of your traffic comes from those 3-5K-word articles? And what's your return on investment from those compared to, e.g., one of the pages that Aaron linked?

I'm curious! I'm in the content-creation business, and if what you're doing works, I'll either need to radically change what I do or to start copying you.

benatlas · on Feb 23, 2010

This is not a small matter, this destroys the internet for the rest of us. The Internet becomes unusable and untrustworthy. And you are doing this in a profitable collusion with Google. You cynically going where Google pushes you.

lmkg · on Feb 22, 2010

> The claims that we are "scraping" are absurd... we're using google, bing, twitter, etc. apis to do a comprehensive search page.

Some people would consider that scraping. How do you define "scraping" and how does this practice not fall under the definition?

sp332 · on Feb 22, 2010

I think the point is using APIs, not "screen-scraping" - but the end result is mostly the same.

ErrantX · on Feb 22, 2010

> We are the process of NOINDEXING the pages that are below 300 words just to make Aaron happy... we actually had these noindexed before our last version and that got lost in the shuffle of the new launch (really, it did... when you do new code you might leave something out of the old code).

It's been a few weeks: have we seen any evidence of this happening?

mvandemar · on Feb 22, 2010

Newp.

aaronwall · on Feb 22, 2010

Not only are they NOT noindexing them (which they said they would do a couple years ago and a couple weeks ago), but they are still submitting them to Google via an XML sitemap

Likeso http://tinyurl.com/ydyo3ud

cakeface · on Feb 22, 2010

I understand that many of us disagree with with what Jason is saying in his comments, but I don't think that we should be downvoting his comments below 1. He is at least making an attempt at reasoning argument for his actions, and taking the time to post them. It makes the discussion hard to follow when all the detracting comments are a barely visible grey.

mscarborough · on Feb 23, 2010

>We are the process of NOINDEXING the pages that are below 300 words just to make Aaron happy... we actually had these noindexed before our last version and that got lost in the shuffle of the new launch (really, it did... when you do new code you might leave something out of the old code). > >i'm also getting a list of every page under 300 words and having the page managers build them out in 30 days or deleting them.

It's not just Aaron Wall that sees something fishy.

But for those of us who are just tired of the whole drama, just change how it's done, or don't do it at all. Adding nofollow and not submitting auto-generated content in Mahalo's sitemap does not seem like a great amount of development work if you really want to change it.