I just wanted to say thanks to all the people on Hacker News who asked for this option. We'll look at offering a "block site" option directly in the search results over time, but it takes longer to write, test, and launch that code.
In the mean time, use this extension to clean up your own search results and tell us which sites you don't want to see in Google.
Matt, I don’t understand why Google need to offer blocking as a user option. Any web netizen with some experience can tell at a glance the spammy and low-quality results on a SERP, but Google, with their vast knowledge and experience and their ability to monitor all sorts and aspects of user clicking behaviour, cannot?
I alluded to that in http://news.ycombinator.com/item?id=2218627 . People feel comfortable with Google removing blatant spam: hidden text, cloaking, sneaky JavaScript redirects, etc. People tend to feel less comfortable if they feel like Google is making an editorial decision.
If we get a good signal from this extension, or from offering block links in Google's search results, then it's much more similar Gmail's spam algorithm, where an email is labelled as spam partially because a lot of users say it is, rather than because of some editorial decision on our part.
Matt, before we get to the point where potentially controversial editorial decisions will have to be made, I would imagine there are things that could be done automatically and uncontroversially.
For example, sometimes we see copies rank higher than originals. Why does that happen? Google know where they first saw a particular piece of content, don’t they? Why don’t they use that as a heavy ranking factor?
Consider this: I make a blog post on my relatively new blog examplesite.com. Techcrunch picks up on the article and immediately reposts it on their site. Which do you recon would be the first to get indexed?
Now which is the original from Google's point of view? Relatively smaller blogs can take significantly longer to index then sites that have massive amounts of content moving about daily.
I have thought about that and I cannot believe it is really a problem.
My small negligible personal website notifies Google, Bing and Yahoo immediately and automatically as soon as I publish something new. It also publishes a feed. Even if the content is picked up and republished right away by a site that is indexed every minute, it should be possible to determine correctly the original publisher.
In some cases, I can think of more ways to determine the original publisher. And certainly Google can think of even more.
I publish an article at only-original-content.com. The article has some images that are served from only-original-content.com/images. Now only-copied-content.com takes my original article and republishes it. Since only-copied-content simply copied the HTML, the images are still served from only-original-content.com/images.
In that case it should be simple to determine who is the original publisher. Of course, only-original-content.com could simply be a CDN that only-copied-content.com uses for its static resources, but, again, it should be easy to determine whether that is the case.
demetris, that's exactly what we improved with a recent algorithm change: http://www.mattcutts.com/blog/algorithm-change-launched/ . As you point out, that was a more straightforward change, and that's why we were able to launch that one first.
That said, if you wanted to share some examples where you're seeing copies rank higher than originals I'm happy to pass that on to the right folks. In fact, some of the right folks are already on this thread. :)
Hmm. That could be if Google Groups has changed the url structure, which could make crawling it harder. Or because USENET/mailing lists don't always have a centralized/canonical location on the web, which makes dupe content more of a potential issue. It's not the usual "Website X copied my website" scenario.
Getting on a tangent here, but Google has a hard time crawling Google Groups? Have you tried emailing support@google.com? Just kidding but in all seriousness how is it that a Google property has bad SEO?
You already have the SafeSearch filter that can be toggled on and off to show you different search results. Why not an Editorial filter as well (perhaps disabled by default)?
I understand the fine line you have to walk, and I'm glad to see that you guys take it as seriously as you do.
Personally, I'm glad that you're putting this out as a user-controlled thing. I like the fact I'm able to get rid of results that aren't necessarily spam or SEO'ed garbage, but I where I still know that I never want to see results from that site again.
However, if it is built in to the general results, could you also add a metric to Webmaster Tools showing the frequency with which your domain is reported. It would also be good if the blocking could have timeout period so that sites can be given a chance to improve their behavior, rather than just deleting that domain forever.
A "web netizen" is intelligent. Google - as you're using the term - is a sophisticated piece of software that leverages much of what we know about AI, but it is not actually intelligent. The determination you're making that something is spam and not worth your time may be simple for you to make, but it's hard to get software to consistently make the same determination.
It's one thing for an individual user to ask not to be bothered by results from an entire domain. It's quite another for Google to make that call for EVERYONE. I think this is a great first step. If the evidence is overwhelmingly against a domain, I would hope that Google would use that as a strong negative signal against the domain to adjust the ranking of the site.
Another thing to keep in mind is to think about how many unique search queries are entered into Google on a daily basis. I don't know what the figure is at Google, but some engineer at Bing said that 25% of searches they've never seen before. You can't get a human to look at each of these searches and remove the content farms, you have to do it algorithmically or else it will never work.
Getting users to do the work will help with the most egregious abuses. One thing that content farms are good at is coming up with answers for questions that don't have an answer available on-line. For example, the query "what's the personal cell phone number of <insert celebrity>?" will just take you to a spam farm because that's the only kind of website that claims to know the answer.
Experts exchange answers aren't nearly as bad as things like yahoo answers, etc. (PS - you know the answers are available at the very bottom of the page, right? I thought this was common knowledge, but just wanted to make sure the content was the reason behind the animosity and not that the information used to be inaccessible....)
> Experts exchange answers aren't nearly as bad as things like yahoo answers
While yahoo answers etc. might provide bad answers, experts exchange provides no answers at all. All you see is an open question with a nasty subscribe button that will only work if you haven't already used it more than 30 days ago.
So experts exchange is more than useless: It's just plain advertisement without any added value. It's the kind of stuff you expect in the ads section of a search engine, but definitely don't want to show up in your search results.
Well EE is nasty and I hate it, but they have the right if they choose to do so. Anyway they didn't steal the contents from anywhere. That being said, I'll probably add EE to my block list :)
Just scroll down past the stupid paywall when you come from google or bing. Or become an expert for free http://bit.ly/EEfree
I have been an expert there (not an employee) since 1998 and it is a great site. For a while now there are free articles and blogs too. I agree with the haters that the paywall should go away, but that is the decision of the owners of EE and those in the know can get full ad-free access by answering approx 3 questions a month. It took me more effort to have a reasonable flair at SO...
I don't mind their business model, but I can't understand why they appear near the top of so many search results.
Either they are very good at (mis-)using SEO techniques, or there are really many websites linking to them. However, I personally find it hard to believe that any author of a blog article or forum entry links to an EE answer voluntarily.
Yeah, those and WiseGeek for me. I keep on getting WiseGeek in searches (especially if I search for a question, which I occasionally do) and the content is complete and utter crap.
Thats a very important question, as without an answer it seems like the original comment is just excellent Public Relations. *Did not mean to imply that it is PR :)
We've absolutely done similar things in the past, but that comment was the spark behind this most recent Chrome extension. I hope that's specific enough. :)
After using this for less that 12 hours and I think it's fantastic. I also think it could be vastly improved upon. What if you added reddit style "up" and "down" votes to my results?
Maybe it sounds silly at first pass, but if all my votes get passed back to Google, your algorithms could learn from an incredible crowd-source treasure trove of knowledge.
A black list wouldn't just help you find the bad ones - but up-boats could help you suss the good ones.
There's the concern of DIGG style voting blocks, but perhaps even that could be detected by sufficiently sophistimatacted algorithms. (detection of pairity in voting that fell outside of statistical norms could be downgraded in quality).
I mean, sure, I'm already gaming the system in my head ... automated creation of accounts, automated up-boating (which I suspect happens on Reddit as well, despite being bad "reddiquette")
OK. Actually, maybe it's a terrible idea. Like DIGG with only down-votes. Seems like a potential treasure trove though.
In addition to removing the blocked site from the search results would it be possible to be able to remove a site's contribution to pagerank? I feel that this would probably be more effective in the long run...
Matt do you think this will be abused at all? I.e. organized blocking of sites on mass to push it down rankings?
Is your block list something that will be synced so that it is available everywhere you use Chrome and eventually just synced with your Google account?
On top of counting the block votes, Google could check if the site is crippled by AdSense links. This could filter some shot-down-my-competitor efforts.
In the mean time, use this extension to clean up your own search results and tell us which sites you don't want to see in Google.