This whole thing is insane.... we have "stub" pages just like Wikipedia.
These are topic pages that people are working on and THEY DON'T RANK in search engines until they we get the word count to around 300-500 words.
We are the process of NOINDEXING the pages that are below 300 words just to make Aaron happy... we actually had these noindexed before our last version and that got lost in the shuffle of the new launch (really, it did... when you do new code you might leave something out of the old code).
i'm also getting a list of every page under 300 words and having the page managers build them out in 30 days or deleting them.
Anyway, i thank Aaron for busting out chops and making us better!
The claims that we are "scraping" are absurd... we're using google, bing, twitter, etc. apis to do a comprehensive search page.
i dont know everything about SEO, but i don't understand this claim by Aaron. i think he is trying to start trouble for us... and maybe it will work. Thanks pal!
- If you don't want those pages indexed in Google then why are you submitting them in an XML sitemap?
- I have already shown examples of the 0 original content pages ranking, so how can you claim that they do not rank?
- You are not scraping directly, you are pulling from 3rd party sites and using it as content on your own site. Which is worse, because there is no way to opt out of it.
- My problem is not just with what you call stub pages, but with most of your pages. When you give people embed code to embed your content in their site you give them an iframe AND a direct link back to you. If you want me to stop highlighting the absurdity of it then perhaps you should hold yourself to the same standards as what you offer others. But you do just the opposite when you embed 3rd party content in your site. You slap a nofollow on the links and embed the content directly into the page (rather than in an iframe).
- Worth noting that every time I mention the above point you end up talking about stub pages or experiments or some other strategy to try to redirect attention. But in reality, what I am talking about is what you do on almost every page of your website.
1. everything in site is in the sitemap... it's not selective. it will be shortly.
2. they don't get traffic is my point... we look at any page that gets over 100 page views in a month and we build those pages out. so, even if you find a page that ranks it will not have traffic. if it has traffic it gets built out.
3. we are not scraping, we are using search APIs
4. i dont understand this issue of our widgets (which don't get used to be honest.. it's a failed program)
5. this is simply false... our traffic comes from how to articles, walkthroughs and Q&A. if you want to know what the top 10 pages are they are things like how to play guitar and call of duty walkthrough pages. those things are 3-5k words!
- 1. everything in site is in the sitemap... it's not selective. it will be shortly.
Ah, so now you admit it was intentional. But good on you for (eventually? hopefully?) fixing it.
- 2. they don't get traffic is my point... we look at any page that gets over 100 page views in a month and we build those pages out. so, even if you find a page that ranks it will not have traffic. if it has traffic it gets built out.
If a person has a quarter million pages that are getting 5 visits each that is still a lot of traffic. Especially when the page has 0 editorial costs.
- 3. we are not scraping, we are using search APIs
The end result is what people would typically call a "scrapper site". It is irrelevant how it is created (if you scrape directly or syndicate from somewhere else that is scraping). The issue is a lack of editorial control (see your page about 13 year old rape) and a lack of citing sources with links.
- 4. i dont understand this issue of our widgets (which don't get used to be honest.. it's a failed program)
Search engines have duplicate content filters. If the content is within the page as HTML (as you do on Mahalo) then you can often outrank the original source for their own content. You bypass this issue and me mentioning it if you only use an iframe to embed the content in your pages. But if you embed it directly into the HTML (as you are doing right now) then of course it is bogus.
- 5. this is simply false... our traffic comes from how to articles, walkthroughs and Q&A. if you want to know what the top 10 pages are they are things like how to play guitar and call of duty walkthrough pages. those things are 3-5k words!
I am not talking about your top 10 pages. I am talking about the bottom 300,000 pages, which in aggregate get far more traffic than the top 10 pages do. :D
- just lay off dude... go troll someone else.
Not trolling at all. Just trying to give you valuable feedback, as you have claimed it to be publicly multiple times (unless you were lying when you stated that) :D
Does that mean that (for the remaining pages on the site)...
a.) the other scraped content which exists on the remaining pages will be put in an iframe (rather than as text on the page)
- OR -
b.) that you will be removing nofollow from the pages you are scraping content from?
Either you trust the content enough that you should link to it directly, or you should put it in an iframe such that search engines don't see it. Either route would likely be more akin to fair use than what you are currently doing (automatically scraping 3rd party content into your pages and using it to rank against the content creators, without permission, and without a way of opting out).
After this many lies, do you actually believe he is planning on actually taking that step? In my books he's already used up his credibility. I won't believe anything he says until he says what he has done and someone else publicly verifies it. (He has lied enough that I don't think it worth my time to bother verifying anything he says. The bozo bit is well and truly flipped.)
What fraction of your traffic comes from those 3-5K-word articles? And what's your return on investment from those compared to, e.g., one of the pages that Aaron linked?
I'm curious! I'm in the content-creation business, and if what you're doing works, I'll either need to radically change what I do or to start copying you.
This is not a small matter, this destroys the internet for the rest of us. The Internet becomes unusable and untrustworthy. And you are doing this in a profitable collusion with Google. You cynically going where Google pushes you.
> We are the process of NOINDEXING the pages that are below 300 words just to make Aaron happy... we actually had these noindexed before our last version and that got lost in the shuffle of the new launch (really, it did... when you do new code you might leave something out of the old code).
It's been a few weeks: have we seen any evidence of this happening?
Not only are they NOT noindexing them (which they said they would do a couple years ago and a couple weeks ago), but they are still submitting them to Google via an XML sitemap
I understand that many of us disagree with with what Jason is saying in his comments, but I don't think that we should be downvoting his comments below 1. He is at least making an attempt at reasoning argument for his actions, and taking the time to post them. It makes the discussion hard to follow when all the detracting comments are a barely visible grey.
>We are the process of NOINDEXING the pages that are below 300 words just to make Aaron happy... we actually had these noindexed before our last version and that got lost in the shuffle of the new launch (really, it did... when you do new code you might leave something out of the old code).
>
>i'm also getting a list of every page under 300 words and having the page managers build them out in 30 days or deleting them.
It's not just Aaron Wall that sees something fishy.
But for those of us who are just tired of the whole drama, just change how it's done, or don't do it at all. Adding nofollow and not submitting auto-generated content in Mahalo's sitemap does not seem like a great amount of development work if you really want to change it.
These are topic pages that people are working on and THEY DON'T RANK in search engines until they we get the word count to around 300-500 words.
We are the process of NOINDEXING the pages that are below 300 words just to make Aaron happy... we actually had these noindexed before our last version and that got lost in the shuffle of the new launch (really, it did... when you do new code you might leave something out of the old code).
i'm also getting a list of every page under 300 words and having the page managers build them out in 30 days or deleting them.
Anyway, i thank Aaron for busting out chops and making us better!
The claims that we are "scraping" are absurd... we're using google, bing, twitter, etc. apis to do a comprehensive search page.
i dont know everything about SEO, but i don't understand this claim by Aaron. i think he is trying to start trouble for us... and maybe it will work. Thanks pal!