I think their terms of use say they may pass along the contents of form fields. ...

moultano · on Feb 2, 2011

Suppose it uses all text fields on the web without treating google specially.

Being as charitable as possible here, I'm willing to believe that some well-meaning engineer coded this up without special casing google. Tested it out, found that it worked amazingly well, and then launched it. This seems unlikely, but possible.

However, somewhere along the line someone must have known that the biggest benefit of this signal was recreating Google results. I'm not willing to believe that no one figured this out even if it wasn't the initial intent. At which point there's an ethical dilemma. At Google, a system like this wouldn't launch.

Regardless of the mechanism, I don't believe that nobody at bing knows that this is what was going on. Maybe it's a cynical attempt to get around robots.txt. Maybe it's an honest mistake that gradually became a dishonest mistake, but I'm not willing to believe that they are oblivious.

gojomo · on Feb 2, 2011

If you type the query [site:nytimes.com] into Google News, you've recreated a different presentation of a feed of latest news from the NYTimes. It is inherent in the search business that you're collaging material from elsewhere. And for certain heavily-qualified searches – long-tail, few mentions, hapax legomenon/'googlewhack' – a single source is likely to stick out.

Google is unavoidably a giant signal-source on the web. Even if Microsoft instead sent unique keywords to contract writers to build out findable summary/directory web pages one-by-one, what would those writers do? Research via other search engines, starting with Google, and be heavily influenced by the few (or top) results they found, highlighting the same sites. So your results would still percolate outward, via a slower, more expensive, more manual process. (Would that process, laundered through time and multiple agents, meet your ethical standards?)

Such is the nature of Google's position today. As Rich Skrenta of Blekko has put it: "The net isn't a directed graph. It's not a tree. It's a single point labeled G connected to 10 billion destination pages."

A little of Google's proprietary wisdom is leaking back out. The amount seems small compared to all the freely-offered info Google sucked in to create that wisdom. And, the proprietary wisdom is leaking back out via the same sort of bulk, automated mining of implicitly expressed preferences that for which Google itself is famous. So to me this seems more like karmic balance than an ethical transgression against Google.

greendestiny · on Feb 2, 2011

There is a huge difference between recreating Google results and incorporating a Google honeypot link into their results through legitimate means. One link isn't a search result, a search result is an ordered list of results. If its whole search results then Bing's got some questions to answer and deserves some bad press. Otherwise, not so much.

othermaciej · on Feb 2, 2011

Really? I doubt replicating Google long tail queries is the biggest benefit. User clicks seem like a useful signal of relevance for the same reason links from other sites are a useful signal of relevance.

moultano · on Feb 2, 2011

Long tail queries are the ones which Google could easily demonstrate the effect without any confounding variables. I'd imagine this is affecting all of their ranking, as much as their own click data weighted by volume.