Hacker News new | past | comments | ask | show | jobs | submit login

"We can't tell whether Jason is misleading us about the proportion of scrape-generated pages on Mahalo without access to any Mahalo page statistics."

Well, when doing a site search in Google for site:mahalo.com "Links Powered by Google" there are 553,000 pages indexed in Google which are using scraped search result content (with optimized page titles) to help pull in traffic. http://www.google.com/search?q=site%3Amahalo.com+Links++Powe...

and keep in mind that is just links from Google...there are also chunks of content from Google blog search, Twitter, and other sources (images, videos, news) on those pages

he is full of ____ if he is trying to get anyone to buy that doing the above is responsible for less than 1% of their traffic when Compete.com shows their search referral traffic as being ~ 60% of their referrals

It is not just a few (thousand) 100% auto-generated (experiment/stub/zebra/spam) pages that have scraped content on them...the above search shows Google estimates over a half million pages in their index contain content from their own search index...total regurgitation of 3rd party content :D

And lets not forget that 1.) he is using people's optimized page titles as content on his pages 2.) search traffic monetizes better via ads than other traffic forms...especially the search traffic that lands on a page for some random longtail keyword made up by arbitrarily combining chunks of 3rd party content mixed together and re-aggregated. 3.) in addition, there is a $0 editorial cost to scraping these millions and millions of content snippets and re-displaying them. 4.) he is making at least 5 figures a day from that content scraping...with 100% certainty.

his 1% remark is just another form of misinformation. nothing new there!




Where the hell is Google to regulate this sort of stuff? It's appalling that they'd take a back seat to this. No doubt they know it's going on.


Mahalo is monetized through Adsense(well some affiliate links too). + the fact that the result is usually incomplete, means people are more likely to hit an ad to find a better answer.

So it's most likely a big cash cow for Google.

Which do you think it's better for them to send a user to? Mahalo powered by Adsense or __________ powered by Tribal Fusion?

Oh and just to clarify I'm not saying Google is doing anything evil. They don't adjust their algos to help push Adsense sites higher. But when an adsense site reaches the #1 position, they do appreciate the extra revenue.


You don't think it's wrong to let a high-profile site blatantly break the rules everyone else has to follow because it profits them?


of course, it's just I can see how it works out for Google too


This kind of argument is usually vacuous.

Reputation is worth billions to companies like Google. Google doesn't give a fuck about Mahalo.


It seems to fit the facts - what is an alternative explanation that fits the facts? That they just don't notice?


Yes, that they don't notice. People are very quick to say companies make decisions based upon money, but they don't pay attention to the fact that the money is usually far smaller than even the short term loss incurred by not doing the right thing.


precisely, I think this is a perfect page for what he is doing:

http://www.mahalo.com/card-games

SEVEN powered by Google areas.


Our pages are built by our community, so the quality will vary. That page doesn't have a "vertical manager" yet, but it will. Then we would build out the content a little more.

It's not a perfect system, people can't put multiple search boxes in there.

However, that page will never rank well in a search engine (unless by a fluke). In order to rank well you really need to have 500+ original words.

We're in the process of moving all pages to that standard. It's really a self-regulating thing: if our contributors make short pages they never rank and never make money. They get frustrated and we teach them how to make longer pages and some day they may rank.

... it's really not a problem, and the truth is we rank for three things well:

1. video game walkthroughs (typically 2-10,000 words!) 2. how to articles (typically 800 to 5,000 words) 3. question & answer pages (typically 300 to 10,000 words),

Isn't this basic SEO (and i'm not expert): build original content and you might rank. Build short pages, you don't rank.

All pages start short (just like wikipedia stubs do), and over time we make them longer. that's the normal process.


If I list a few dozen example "fluke rankings" will that mean you are full of crap once again?

Let me know how many need to be listed for you to see the patern, and I will prove you factually incorrect once more.


> Our pages are built by our community

Your 'community' includes robots, according to the evidence Aaron collected.


Robots are people too.


Why has Jason's comment has been downvoted? People don't consider it relevant? Amazing.


It is relevant, but I believe the downvotes are a response to the lack of an option to otherwise flag a comment as complete bullshit.


The mob wants blood..... be careful not to get dragged into it. :-)


Seven? I could spot 8 seperate google text ad blocks and 3 graphical google ad blocks.


I didn't count ad blocks, just the keyword stuffing with the whole "powered by google" areas


Looks like mahalo is a prime candidate for a google PR = 0 and deletion from the index treatment.

What's keeping them ?


I think vaksel's comment "Mahalo is monetized through Adsense..." at least partly explains it. Other than that, is anyone lodging policy-violation complaints when they find dodgy pages?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: