Hacker News new | past | comments | ask | show | jobs | submit login
Why is Google so hysterically hypocritical about Bing using its public data? (roughlydrafted.com)
182 points by HardyLeung on Feb 2, 2011 | hide | past | favorite | 128 comments



I read half of this. Its late, and I don't have the time or energy to refute point by point every single wrong thing that is written in that half.

Its easy to sum up. Yes, Google indexes hundreds of millions of sites. It does so in order for other people to be able to search and find those sites, which is important to their owners. Its a symbiotic relationship, not theft of intellectual property.

Google has spent billions of dollars in manpower and physical capacity in order to be able to do that. Also, each and every one of those sites can very easily say "don't index me" with a simple robot.txt file on their site. Some do, most don't, because most sites find it valuable for Google to index them.

Meanwhile, Microsoft is trying to compete in the same space as Google. They are also spending lots of money and manpower to build their search engine. But, by using Google's search results to improve their own product, they are acting as a parasite. Google gets no benefit from Microsoft using their data.

So to sum up my own summary: Google is a mass symbiote, Microsoft is a parasite.


Microsoft do NOT steal Google's search results. They use clickstream data from Bing toolbar users to improve their search results.

What happened is that Google engineers engineered a use case, in which signal from those clickstream appear like a stealing of search results. Or in other words, Microsoft uses Google users behaviour in their ranking algorithm. But that's not unethical, and Google is doing the same thing.


> They use clickstream data from Bing toolbar users to improve their search results.

If that leads to Bing absorbing Google's results, and eg. suggesting spelling corrections they would have never figured out except that Google thought of them first, then they are indeed stealing results, whether they meant to or not, and need to stop.

George Harrison didn't mean to copy the song "He's So Fine" when he wrote his song "My Sweet Lord," but he lost the lawsuit anyway and had to pay damages. Whether you call it "clickstream" or "user behavior," Bing is incorporating an association that could only have come from Google, and Google's robots.txt makes it very clear that robots are not allowed to mine search results.

> But that's not unethical, and Google is doing the same thing.

As has been mentioned many times, Google does not take user behavior from the toolbar as a signal for ranking.


> If that leads to Bing absorbing Google's results, and eg. suggesting spelling corrections they would have never figured out except that Google thought of them first, then they are indeed stealing results, whether they meant to or not, and need to stop.

Even if that's indirect information? For example, the Bing toolbar doesn't really say (in Gargamel's American voice for effect), "Nyhaahaha! I see these google search results! I will steal!" No, rather it says, "Ahh, after the result of a query to this site, we then leave the site to go to this page. If that query and this destination page appear in aggregate, Bing should take notice."

When I first learned how the Bing toolbar did this, I had two reactions: "Oh, that's clever. It's like engaging your users to help you be a directed crawler. It's 'querytext', which is not unlike google's innovation, 'linktext'." and then "But I would still never install the Bing toolbar. Man, that thing is a sad clown show." Honestly, it's one of Bing's smarter ideas.

There is only so far pure crawling and statistical analysis can go; there simply isn't enough data there and everyone knows how to use that data to great effect. Every search engine is incorporating new realtime communications and user behavior streams into their search results. Google certainly does this, albeit without a toolbar. Bing simply has the entire Microsoft software stack to lobby for help, so if someone opts into the Bing toolbar, they can opt into submitting additional information to improving the Bing index.

I'm not a big fan of MS or Bing, but in this they are only culpable for being clever. They are using querytext to improve relevance. I'm told you could make the same stunt work from a Wikipedia search box, leading to a wikipedia page.


Microsoft uses behaviour

I doubt they are only targeting Google. But I completely agree, the data is not being mined from Google, it's a relationship for search term a to page b created by the user, not stolen from Google.


Google is the one creating the association, not the user. The relationship between term a and page b was created by Google, and confirmed by a user. People don't click on search results that aren't there.


This is the most important comment I've read so far on this issue. What everything comes down to is whether user generated associations between searches and results via clicks are fair game as a signal.

As this comment points out, there is a decent argument that searchers on Google or other engines aren't the ones creating the associations, and therefore perhaps Bing has no right to use that data.


Google is not creating the association. Users are creating the associations, Google is discovering them and presenting them back to other users, and subsequent users are confirming the associations. In this case Google engineers, masquerading as users, created associations, then Google presented those associations back to the user, and Bing noticed the association between the term and the page.

The relationship between a term and a page was not created by Google, it was created by users. Google just indexes everything and makes note of these associations, but its does not create the link in the first place.


Users are creating the associations, Google is discovering them and presenting them back to other users, and subsequent users are confirming the associations.

Going down this road of debate, we'll be getting into the semantics of "inventing" vs. "discovering."

Turning data into ranking is the whole purpose of a search engine, just as turning data into theory is the purpose of science, and turning experience into art is the purpose of art. Essentially here, we're debating whether search ranking is more like science, in which there's a correct answer that you are uncovering, or like art, in which all of the product is subjective.

Having worked in the field for five years, I'd argue that it is far closer to art than science. Google's rankings are its subjective determination, and the courts have agreed that Google's rankings are its constitutionally protected speech.

Though the data may all ultimately come from the human-created internet, the transformation of that data is still important, and subjective. To claim otherwise is to miss the whole point of search technology in the first place.


My issue with it is this: Had Google not created a better algorithm would those users be clicking on those links? If Google shut down today would Microsoft be able to associate those sites with the search terms?

I should think the answer to the above is 'no' in both cases, which is why this is cheating. It's probably not illegal, but I find the practice to be unethical.

Ask yourself this: If Google shut down today, would Microsoft be providing more relevant, equally relevant or less relevant results for those searches? If the answer is less relevant, then I think it's clear there's been a lapse of ethics.


Ask yourself this: If Stackoverflow shut down today, would Google be providing more relevant, equally relevant or less relevant results for programming related searches? If the answer is less relevant, then I think it's clear there's been a lapse of ethics.


The difference is that Stackoverflow says "please index us" while Google says "please don't index us".


and they don't need to index or crawl google pages to do what they are doing.


I don't think it's so clear. Is drafting unethical?

In any case, for all you know, if everyone started using the Bing Toolbar, it may provide better clickstream data, causing more relevant Bing results.


But the effect is the same as intentional scraping and outright stealing: search results that could have only come from Google are appearing as Bing results.

Bing needs to blacklist Google from its clickstream. Simple.


...the effect is the same as intentional scraping and outright stealing: search results that could have only come from Google are appearing as Bing results...

That's one effect. While it's vivid, it might be a tiny side-effect only notable in contrived cases.

Overall, this kind of URL-after-URL signal, extracted from every participating user, and every trail through both search sites and non-search sites, might be discovering valuable terms-in-preceding-URL-to-later-visited-URL associations. These associations might result in many search improvements, other than the one-for-one result porting Google's experiment has found. We don't really know the relative magnitude of porting-results versus other-benefits, yet I think that's important to the analysis.

If a useful automated or user-driven process generates a little indirect infringement around the edges, is that enough to demand the process be stopped entirely? Note, that's not the standard Google wants applied to user uploads to YouTube, or excerpting of news and websites onto Google services. Google says: "defend yourself, by adding opt-outs (robots.txt) or sending takedown notices, and we'll undo the incidental infringements eventually".


the effect is the same as intentional scraping and outright stealing

The google engineers intentionally sent this click data to Bing, so is Bing really stealing? It's odd to act surprised when Bing uses the data that was intentionally sent to it. Bing could specifically ignore Google search results pages when it is tracking clicks, but is that legitimate? Google scrapes everything, why shouldn't Bing?


They were testing to see if it was true. Many users are doing it, bing is taking data for non-google engineers too.


The point of collecting click data is not to target google engineers, it is to collect data from masses of people doing regular searches and to improve them by seeing which links get clicked on, so obviously Bing is "taking data for non-google engineers". Furthermore, there's no indication that google search results pages are even distinguished from other pages in this.

In fact, the data that it takes from Google engineers for carefully engineered corner-case searches is the exception.


I don't even think they should just blacklist Google. They should just respect robots.txt.

edit: I should have clarified. I know that the Bing crawler likely respects robots.txt, but if they are using clickstream info to build their index, it seems right that they should respect robots.txt there as well, no?


I'm pretty sure the Bing Crawler does respect robots.txt. The data Bing collected didn't come from spidering Google.


You could strongly argue that collecting clickstream and other user browser session info via a toolbar is not a form of web robot (crawler, spider, etc.), and thus robots.txt does not apply.


I agree with your comments that toolbars should respect the robots.txt because even if a human is doing the crawling, it is still an automated system that is indexing information from that site. I would not want toolbars attempting to send data back to Bing based on my queries on a company Intranet or a site that would normally not be indexed. Personal data entered into what the toolbar thought was a query field could be sent onward as well even if the robots.txt on the site restricted it. I think they should respect robots.txt in this case even if they are only monitoring user behavior.


Would Bing then get called out by Google for the inevitable 'anticompetitive' lowering of Google's Bing search rankings?


Actually, Google denies that they do the same thing pretty emphatically. Basically, they claim that the same test in reverse would not work, because the data stream from their toolbar in a reversed test would not affect their algorithms in this manner.

But agreed that it seems like a very small nit to be magnified the way it has. Indeed, why they don't do this, and why we should care that Bing does, doesn't seem to be directly addressed other than by hand-wavery and PR speak about the research they've put into their algorithms and such.


I do not understand this.

* Are Microsoft saying end users are naturally clicking on nonsense phrases invented by Google?

* If not, what are they saying?


What's not to understand? Microsoft is colecting search phrases from searches people make though the IE search box. Then they are tying that data to the next page people click to after the search. Google intentionally polluted their data using odd phrases and asked users to click on their bogus results. Given that the only data bing would have for those strange phrases is polluted crap from Googlers it's not surprising that their engine that's using user supplied data would attempt to return the only match it can find.


Since nailer's comment on my other post can't be directly replied to.

From googles mouth

http://googleblog.blogspot.com/2011/02/microsofts-bing-uses-...

"We gave 20 of our engineers laptops with a fresh install of Microsoft Windows running Internet Explorer 8 with Bing Toolbar installed. As part of the install process, we opted in to the “Suggested Sites” feature of IE8, and we accepted the default options for the Bing Toolbar.

We asked these engineers to enter the synthetic queries into the search box on the Google home page, and click on the results"

In all honesty I'm a bit wrong as they did say they used the google box on the page.


What may be misleading about Google's statement is that it paints the picture of engineers hunched over laptops patiently typing in nonsense queries a keystroke at a time.

I guess I just have trouble seeing it taking more than about 3 minutes before 20 of the top industry engineers figured out a way to automate the process. Which is pretty long compared to the 2 seconds it would take for them to start thinking of ways to improve the SEO.

It's the human factor in Google's "experiment" that just doesn't fit. If they wanted a controlled approach, they would have written an application and run it and logged everything. Instead, it appears that they provided laptops so that the engineers could experiment and innovate their way to exploiting the Bing toolbar.


> In all honesty I'm a bit wrong as they did say they used the google box on the page.

Thanks - that's what I've been getting at. This isn't data entry into the Bing toolbar, it's into a non-Bing page when one has the toolbar installed.


> What's not to understand? Microsoft is colecting search phrases from searches people make though the IE search box.

This is not correct. From Google:

"We asked these engineers to enter the queries into the search box on the Google home page"

Google did not enter the data into the IE search box.

Edit: I see you replied to yourself acknowledging the mistake - please ignore this then!


> Google asked users to click on their bogus results.

Got a citation for that?


We asked these engineers to enter the synthetic queries into the search box on the Google home page, and click on the results, i.e., the results we inserted

http://googleblog.blogspot.com/2011/02/microsofts-bing-uses-...


Engineers are generally not considered end users.


The test wasn't natural. A number of Google people did click on the links artificially associated with the nonsense phrases, while they had Bing's toolbar active.

If the Bing toolbar is picking up on this sort of thing generically (i.e. picking keywords out of the query-string on any page and associating them with clicked links, though I'm not sure how it could with a useful degree of accuracy in a way that couldn't be "maliciously" gamed buy underhand SEO activities) then I see nothing wrong in it as long as the users have knowingly opted in to their activity being analysed in this way. It would just be indexing keywords and content just as a web spider would.

If it is specifically detecting that it is on a Google page, and/or other search competitors, than the issue is much more cloudy.


As it's been pointed out, Bing compares more than just the clickstream datapoints. You have to imagine they provide some PageRank/domain weighting to the relevance. So if you setup a dummy site with no existing pagerank/weight, and performed the same experiement, you likely would not see the same results. However, since you can imagine Google is heavily weighted, those data point score high and can rapidly reflect in Bing search results.


But it only worked 7% of the time despite 20 Google engineers' best efforts, the honeypots being ranked number 1, and the data repeatedly sent to Bing.


The Google engineers who planted the phrases were running the Bing toolbar themselves. That part isn't in question.


So MS are saying Google themselves uploaded the data to MS by using Bing toolbar?

Does Google agree their engineers did this?


Yes. From the original article yesterday:

This all happened in December. When the experiment was ready, about 20 Google engineers were told to run the test queries from laptops at home, using Internet Explorer, with Suggested Sites and the Bing Toolbar both enabled. They were also told to click on the top results. They started on December 17. By December 31, some of the results started appearing on Bing.


No. There's a difference between having the toolbar installed but typing something into the google.com directly (which actually occurred) and typing something into the Bing toolbar.


Even if you exclusively use Google's web interface to search, the Bing toolbar gathers anonymous data about your browsing (if you've got that option enabled). It's fairly clear about that when you install it. To my knowledge, all of these search engine toolbars gather statistics on a broader scale than specifically what's typed into their search fields, Google's included.


Dude, I've seen your comments on this thread. Several of them are basically asking people to confirm facts everybody agrees on. You should come better prepared for the next discussion.

Sorry to be an ass. But you're wasting my time on this site, since I have to wade through your questions to get to the interesting ones.


There's something most of HN are missing. You are indeed being rude by not letting me point it out:

* Google having the Bing toolbar installed and entering the search into Bing toolbar is one thing (and I'd expect MS to be using the data)

* Google having the Bing tolbar installed and entering the search into the Google hom page is a very different thing (and I'd expect MS to be using the data)

Judging from the moderation in this thread, people seem to think the first happened.

According to Google, it did not. No other source contravenes this.

Sorry if you think me pointing this out is bad. Perhaps your efforts would be better reporting all the non-hacker stories on the front page?


I know this has been posited, but has it been confirmed anywhere that this is the case?


Google gets no benefit from Microsoft using their data.

What happened to the Google line of "the more happy searching people we have online the better" (usually brought out in justification of their android investment)?

Shouldn't the same logic apply? Especially since at least half of the people using Bing aren't doing so voluntarily § shouldn't google be willing to trade that one search for having everyone happy on the internet and consuming more of it?

Is an incremental increase in Bing's search quality really going to take market share away from google? I'm skeptical.

§ 50% of Bing's traffic comes from fb, msn and windows live mail: http://techie-buzz.com/tech-news/google-is-4th-largest-traff...


Also websites can opt-out of having their content indexed.

How can Google opt-out of having its search index copied by Microsoft?

The funniest argument in this has been "Bing hasn't got the data, Google has an unfair advantage etc" as if Microsoft didn't exist before Google.

Microsoft have had years headstart to get their online division doing something worthwhile, but they still can't get it right.


No one has shown any evidence that Microsoft has directly copied Google's search index, or even violated the directives in google.com's robots.txt file.

What Microsoft apparently has done is to infer relationships from Gooogle SRPs to various sites by way of information gathered through the Bing toolbar (and possibly other documented and user opt-in channels w/in IE). Using toolbar collected signals is well-established practice -- that's how Google obtains information regarding site performance.


The only thing I haven't seen is this. Ok, using the clickstream data is an established practice, but does Google use clickstream data from users making searches on OTHER search engines, or just their own?

This is the crux for me, Bing is using data from Google, does Google use data from Bing or only themselves?

Even if all true, I don't see it as unethical business practice. Definitely a marketing black eye, but not much else.


The information flows one way, from Google to Bing.

Google is careful not to do this.


Put up a terms of use or force users to accept a license agreement. They do no such thing today. So I think you'd have a hard time saying Microsoft has done anything legally wrong here.


Google already has terms of service. You agree to them by using Google. http://www.google.com/accounts/

5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services.


In this case I don't think Microsoft violates Google's TOS because it is not directly consuming the "service" provided by Google, but rather peeking at what the user has done (and by installing the toolbar the user has already agreed to let Bing do it).


Imagine the Google terms of service said, 'Users must not share any information we provide with anyone else'. Does that mean that use of the Microsoft toolbar would violate these terms and Google could sue Microsoft to stop them using the 'clickstream' data?


We're discussing morality, not legality. Just because it's legal doesn't mean it's right.


They've done as much legally wrong as bundling IE with windows.

It's most certainly morally wrong.

I want to see a project started to r/e the bing toolbar, and create something to send rubbish data back to bing to screw up their search index. That'd be awesome to see.


<meta name=robots content="noindex,nofollow">?


I'm pretty sure microsoft doesn't care...

They're capturing what users click on, which doesn't include any information about the wishes of the website owner.


Given the case of misspellings, Bing appear to associate what the user types into a text box with the link they subsequently follow.

I think this counts as extracting information from the Google results page.

<meta name=robots> is designed to prevent search engines automatically extracting information from web pages. By involving a human in the process, Bing are (presumably) avoiding this rule. Is this a good thing?

I accept that Bing don't access the page content itself, so wouldn't be able to see the <meta> tag if it was there.


> I read half of this. Its late, and I don't have the time or energy to refute point by point every single wrong thing that is written in that half.

It's dilger, why would you bother reading that tripe at all?


I don't know. I actually think he has a pretty good point here. I'm generally positive on Google but I'm embarrassed by their public tantrum. If I was advising a startup with similar grievances against a competitor, I'd strongly encourage them to avoid this distraction. It just feels like bruised ego. And so far, I think Google's cause has been hurt more than helped.


I didn't even read the article. It's late and I'm not going to bother addressing any of the author's points.

But he's wrong.

I can't understand why people make posts like this.


Regardless of whether you think Google is more or less evil than Microsoft, what does it matter how many billions they spent?

It's silly to hold onto some murky moral click-ownership here. It would still be silly, even if it hadn't given Google billions of dollars in profit, every quarter.


Read the rest of it. The last half ties the argument together quite nicely.


Wrong. Google uses the link-out of other sites to rank the linked site.

So if I have a site that review sites about x, google will use my links to rank the sites about x. It will benefit google, sites about x, advertisers for x, users searching for x. But NEVER me.

now, replace "me" for "google" and "google"for "bing". And you will see how you and google are being hypocrite


You may not benefit from the indexing of your link, but you certainly benefit from the indexing of other people's links. Unless, of course, you don't use search on the internet.


By your argument. Google benefits from the doing of Bing, unless of course they don't search on the internet. (the internet meaning Bing, as oposed to your comment, where the internet was meaning Google)


I had to reread the original Google article to verify a point. The signal Bing is alleged to be using is that users CLICKED on the result that came up ... not the results that just came up first in Google upon a search query. And these are users who agreed to install the Bing toolbar!

Here's a hypothesis. I suspect Bing is making use of data that is typed in the toolbar or browser's search box ... not data captured from text typed directly into Google's search page. This might explain why their "honeypot-take" rate was only 8%. This is a subtle point. So ... imagine if you are the coder who wrote this signal collection feature. Would you capture "term in search box,next URL clicked" OR would you capture "if search box search engine == BING or Yahoo or something else then capture term in search box, next URL clicked".

Let me propose an alternative experiment. The test clickers should have clicked on the second or third (or some position N where N > 1) link. This would have demonstrated if Bing is using the actual click information or the search results themselves directly. The former seems completely fair on Bing's part. The latter might be debatable. My point is this ... the Google geniuses fail to distinct these two cases and are muddling it up for the PR drones. This waste everyone's time and productivity. Moreover, they belittle the hard-work of engineers and scientists. It is sad that this is what it has come to.


From your comment: I suspect Bing is making use of data that is typed in the toolbar or browser's search box ... not data captured from text typed directly into Google's search page.

From the official Google blog: We asked these engineers to enter the synthetic queries into the search box on the Google home page, and click on the results, i.e., the results we inserted.


The search text is in the referring URL no matter where it was typed in. It's captured without ever looking at Google.


Oh jeez, not again. This is the Microsoft we know and love to hate, less so now that it is falling apart and must be pitied as an underdog.

Let's see:

Market cap: Microsoft - 239.49B, Google - 195.50B. Revenue: Microsoft - 66.69B, Google - 29.32B. Profit margin: Microsoft - 30.84%, Google - 29.01%.

Falling apart. Right...


All it took to piece together the author's ignorance was the

"Daniel Eran Dilger is the author of “Snow Leopard Server (Developer Reference),”

quote down at the end. I'm sure his line of thought was: "All my friends at Starbucks have MacBook Pros, so Microsoft must be doing really badly."

Probably the same reason half his argument seemed to involve Android jealousy...


Goodness, your comment seems to imply you have an issue with Apple products, their policies, and/or Steve Jobs. Yeah, the guy writes about OS X and Apple, but he's been a tech consultant for nearly two decades. He's worked with various technologies, from various companies. A quick Google search (heh) would have revealed this. Or, what? Did you believe he was some 20-something year old blogger regurgitating the same "FUD" you believe all Apple fanboys are spewing regarding their "Android jealousy"?


Are you kidding yourself consciously or just 8 years behind the news? @_@

http://www.wolframalpha.com/input/?i=goog%20msft

Note how MSFT is swinging around 0% growth, while GOOG is steadily between +20% and +60% growth, in regard to share prices.

The market has spoken some time ago, and the trend doesn't look like changing anytime near soon.


You fail to understand how to interpret a "relative growth" chart. Google is not "steadily between +20% and +60% growth."

Since a random point in time picked by WA it shows what you would be up on another random point in time. If you look at the historical returns of MSFT vs GOOG there is no competition. You're original $10,000 in MSFT would be worth many millions and your GOOG stock would be worth less than $100,000.


That's a valid point, but it bears no relation to matter of the current trend -- which is what OP argued about, right? And the seems to be: GOOG price is growing, MSFT price is flat for some years. So what it skyrocketed years ago, when somebody else held the helm and the market was not saturated with countless `me-too' products?

BTW., switch to the 5 year range in WA and you still see the trend.


Microsoft is still a behemoth, with its fingers in a lot of pies, but unlike Apple and Google, it has not manage to grab onto the future, yet. The future if going mobile, and services. Microsoft has so far lost the mobile race, and the big services do not use MS servers.

They are looking in the worst shape to face the future than they have ever, in the last 20 years.


I partially agree, but with some caveats:

1. Future is also in the games, and here Microsoft has a very competitive product (xbox/kinect)

2. mobile race is not over yet, and WP7 is conceptually a good product, combining polished UI (aka iOS) with at least some hardware variety (aka Android). It just came late to the game, but it is not yet clear if too late. I personally think they can catch up with throwing enough money into it.

3. We will see when they release OS for tablets. They are late here as well, but similar to point 2), maybe not too late.


2. mobile race is not over yet, and WP7 is conceptually a good product, combining polished UI (aka iOS) with at least some hardware variety (aka Android).

I would say that WP7 has a superior conception to iOS, and I am a hardcore iPhone user. My phone is really about my communications and managing my data away from home. Apps are only a means to that end. Right now iOS is pretty App centric. WP7 on the other hand is personal data centric, which in the long run will provide better functionality for "managing my data away from home" with less friction.


2. & 3.) It's very, very late to come to these races with nothing truly compelling to show. I think they've already lost, but, nothing's sure...


It's never too late, the game changes too much. Just 5 years ago Palm and Blackberry owned the entire smart phone market.

I wouldn't want to be in MS position, but with enough money, patience and perseverance you never know what they might do and what may be next


Expect a big push for WP7 once it available on Verizon/Sprint. Microsoft in underdog mode is a very different and dangerous beast. See Windows,Xbox, Word, Excel, NT, Exchange, .NET, IE. The first few versions were laughed at and dismissed but later ones mostly blew away the competition(yes, IE was better than Netscape in the 4.0 and 5.0 days while Netscape took forever to release their 6.0).

Apple likes to retreat into it's niche where it controls everything (helps avoid antitrust regulators breathing down its neck) and has high margins. This is already happening with the iPhone. Android while gaining in popularity, lacks the polish in the UI and performance.

Even with much better raw hardware, Android performs worse compared to Windows Phone 7. See http://pockethacks.com/windows-phone-7-smoother-than-dual-co...

The going empirical formula is version 3 of their products are when they start becoming good. While WP7 is technically version 7, it's actually a complete reboot of the platform with zero backward compatibility. They're taking the middle route between iPhone and Android, having variety in hardware but strict hardware requirements to prevent fragmentation and a controlled app ecosystem. That said, they don't have a good tablet strategy. Maybe Windows 8 will really be tablet optimized, or maybe tablet hardware will catch up to make it run smoothly(the Win UI is still a hard problem on tablets though). Or maybe they will port the Windows Phone UI to tablets.

In short, very interesting times are ahead in the mobile/tablet space and it would be a mistake to write Microsoft off.


Good points. I don't think we can write them off yet.

But it's pretty clear they lack the leadership at this point to move quickly to remedy their shortcomings. That wasn't true at the come-from-behind points in the past (Gates was still around).


First article I've read in a while that's taking Microsoft's side in the whole situation. Every paragraph of this screams publicity. I am just wondering here if this is a PR tactic. Here I would be happy to see a balanced review of both companies behavior but this is just one sided bashing.

<quote>Shame on your pretentious, obnoxious, indefensibly egregious double standard in the field of using public information to turn a profit.</quote> <<-- Ironically, Bing is doing the same and unlike content creators who can opt out by putting a robots.txt file Google cannot opt out of this because like it or not toolbar is always sending back information.

Google indexes public content but how it aggregates it and displays it is their deed. The ordering, etc is Google's work not content creators.Copying it is just like copying an IP. I would love to see this go to court and be fought.


I think that the Phineas Barnum assumption is pretty much valid and this article proves it.


This is starting to get interesting.

Google makes money aggregating other people's content. What happens when people aggregate Google's content? What's fair?


Google don't object to meta search engines (e.g. Dogpile) aggregating their results with those of other search engines.

Google are objecting to Bing seemingly passing off Google results as Bing's own (and not including their ads, no doubt).

It's clearly legitimate for Bing to index a Google results page if it follows a link to it (assuming absence of robots.txt etc) and visa versa.

Does it matter that Bing obtained the Google results via a search from a human not affiliated with Bing?

Does it matter that Google blocks crawlers from accessing its search results, using robots.txt?

Does it matter that Google's result pages don't include the noindex instruction?

Tbh, if I was Google I'd mark my results as noindex and see whether Bing respected that.


That's why it's interesting.

You say Google adds value. Other's don't think so.

Why can't I add value to Google's content and present it in my own way?


I didn't say that Google adds value. Are you talking about Google Search, Google News, or another of their activities as "adding value" or not? Please don't put words into my mouth.


I'm talking about Google taking pictures of my house, snippets of books i've written or or other people's web pages with copies of articles i've written and displaying them without my permission. They seem to think that's ok because they "added value".

Well now people are adding value to Google's products and presenting them in a new way. This opens up a can of worms for Google.


"fair use" is a well-established practice for all of those things except (possibly) search results


Explain what you mean when you say that Google makes money by aggregating other people's content? Google makes money by indexing other people's content and driving traffic to other people's sites. That hardly seems like a bad thing.


http://books.google.com is considered one of the more egregious examples, with many authors and publicists having complained about Google taking samples of the content and posting it online. I did a Google search a while back on a graphics issue, and got the answer I needed from a Google book search result. You could argue (as Google does) that they are really just helping promote the book. But I have no need too buy - I got the answer for free.


Would you have bought that book just to get that one answer? Would you have known the book even had the information you needed without Google's help?


Google News is an aggregator, hence Rupert Murdoch's problem with it.


Google doesn't make any money off of Google News.


Maybe not directly, but I am quite sure that Google uses Google News to enhance search results (they know more about you and your interest), and so indirectly makes money.


Hmm. It might just be me, but I get the feeling that they aren't exactly fans of Google?!


Which doesn't make his point less or more valid. Just because you dislike something doesn't mean you can't provide insight and valid opinions about it.


They're fans of Apple. It's the guy from Appleinsider writing and Google is clearly considered to be an enemy of Apple.


I find it amusing that Dilger mentions Overture, which bought alltheweb from FAST in 2003. FAST, of course, was later acquired by Microsoft in 2008. The number of connections among the major search companies is rather amusing, if not necessarily surprising.


I have a feeling that people's reaction on this is biased by the fact that one of the two companies is Microsoft. What happened here it's not ethically correct, but come on guys, we must be honest. This is business, it's about billions, it's about market shares. It's not a news that money is not ethical.

Google collects data too, Google does its own shit like everybody else, like Microsoft. Google has its own toolbar too. And all this story seems more a marketing move against Bing.

What I want now to happen is to have more competition, instead of crying Google should work hard to be unbeatable and competitive. History teaches us a copy it's never better than the original.


I'll try to make my point again. There is one school of thinking that Google is profiting by farming information from the Internet, literally on the back of many people's labor. You can prove this by running a "Google Sting", that is known as "Google bombing". By having many people setting up honeypot to link an arbitrary keyword to certain URL, you seem to be able to fool Google to artifically associate such keyword with the URL, thus "proving" Google is profiting from your labor. I don't really buy this logic. But Google's accusation of Bing steal from them seems very close to this line of framing.


Is anyone else bothered by the fact that the article switches back and forth between serif and sans-serif font?


Nope. Just you.



http://www.google.com/robots.txt

First two lines:

User-agent: *

Disallow: /search


There is no robot involved. An actual human instructs their user agent to retrieve a document on Google. A piece of code, running in that user agent, does something. If anyone has a complaint here, it is the user, not Google. (Google really really doesn't want to spin this as "MS is spying on our users" because as soon as they say that some person is going to point out "Google is spying on the entire world" and that person will be right.)


Thank you - I've been looking for the metaphor to express my impression of this whole matter, and you've given it to me.

It's Spy vs Spy:

http://en.wikipedia.org/wiki/Spy_vs._Spy

...a pitched battle between the opposite-but-indistinguishable agents of two superpowers, on a plane so far removed from the realm of actual people and their concerns that the customers don't even appear in the frame.


Although strictly speaking you're right, it breaks the intent of robots.txt as the users' clickstream is fed into the server-side indexing engine alongside robot crawled data.

Arguing that according to the strict letter of the spec, robots.txt only has to be obeyed by true crawlers does hold water as strictly speaking Bing could disregard robots.txt altogether - it's not certainly not legally enforceable. The intent of robots.txt is clear and Bing should be trying to obey it wherever possible.

In this case it appears the keyword was associated because it was typed into the Google search box, and from the there to the faked destination. IMHO when the clickstream was analysed it should have disregarded clickstreams that pass a robots.txt-excluded page as these could establish associations that were not supposed to revealed by crawlers.

We aught to hold Bing and any search engine to the highest standard when talking about crawling etiquette.

Also, see http://news.ycombinator.com/item?id=2169817


If I'm google and I block /search I want no company to collect the data.

If I'm a site that blocks /dynamic because I want to make sure my stats are accurate, not factoring bots, and I want to keep the load down on slow-generating pages, then I'm perfectly happy with the data being collected.

Is it the intent of robots.txt itself to block the clickstream data, or is it just google's intent?


By that logic IE could send the entire page of search results back to Microsoft and it wouldn't be a violation of robots.txt. Because hey, it was a user-agent doing it, not a robot! If this is so in the clear, why not have Microsoft do that also?


except that that's not what robots.txt is for. Otherwise your adblock extension could be excluded via robots.txt

edit: bing toolbar is effectively an extension for MSIE. robots.txt is designed to stop spiders. Spiders aren't extensions. Therefore that's not what robots.txt is for. What did I say wrong to get downvoted?


>Otherwise your adblock extension could be excluded via robots.txt

I think that's what most people though was wrong in your comment.

Care to elaborate on how an adblock extension could be excluded via robots.txt in that fictive scenario?


If extensions checked to see if they had permission to alter/interfere/access webpages by looking it up in robots.txt, which is what is being suggested by people who say "Bing toolbar violated robots.txt", then that same mechanism could be used to specify:

    User-Agent:adblock
    Disallow: /
Or even, if the "User-Agent: (star)" is supposed to tell Bing Toolbar not to play about with the page, then why doesn't that also apply to all your other extensions/addon/BHO's/plugins whatever you want to call them?


It should, if they're sending data back to the mothership. Alexa's toolbar among others should respect robots.txt


Dilger first asks "Bing is using Google's search results to improve its own. But what’s wrong with that?"

If Bing uses Google's search results, then we effectively only have one major search engine. Yahoo has switched to the Bing engine, and Bing gets a big chunk of it's results from Google. So we have a search engine monopoly. That's wrong with that.

> "This is the company that indexes blogs [...], and then makes all this information available without consent"

He first asks what's wrong with Bing copying Google's results, because it's public content, then says it's immoral to index public content. Double morals? Logic fail?


Where did he say it is immoral to index public content?

Google complains about Bing using public content. Google uses public content.

Snarky question? Fail?


Imagine a playground. Young, pre-alpha nerds writhing in chaos. Little arms flailing. Faces red. "Poopy face". "Your mommy smells".

Nerd fight.


The world's largest scraper is complaining about being scraped...when Yelp and all the rest of the local businesses directories realized they were being scraped and the data was being used to google's advantage, it doesn't appear that Google gave two shits. Tables have turned, and it's like someone stole the handball from the recess yard.


> when Yelp and all the rest of the local businesses directories realized they were being scraped and the data was being used to google's advantage,

http://yelp.com/robots.txt

Not a mention of GoogleBot.


well, they're presented with either 1) killing their organic search traffic by blocking Google or 2) allowing the crawlers but not being able to stop their data from being aggregated and included in local results.

Yelp's API specifically says their data cannot be aggregated with other sources. When presented with that, Google simply said they weren't using the API, they were scraping.

Apparently scraping is a free pass to do whatever the hell you want if you're Google.


I think Google is being disingenuous with their "bing is copying my data" story.

However, please avoid linking to roughlydrafted.com. At the risk of going ad-hominem, I just wanted to point out that if you have read any of the material on the site, you will know that it is just full of flamebait articles with an absurd Apple fixation.

Everything Android does is copying Apple. Everything Microsoft does is wrong and stupid. If flash is not supported on iPhone it is because it is the morally correct thing to do. When Apple announced the iPhone without an SDK roughlydrafted called developers stupid for asking for an SDK and that javascript on the web is the new SDK. This post however takes the cake, you must side with Bing because Google is a greater threat to Apple than Microsoft. Seriously, how does Overture even enter into this discussion. It is tiresome to see this site linked to all over the web.


Responding to merely two points, one hopes Daniel isn't as stupid as his writing. "This is the company that indexes blogs, newspapers, and both digital and physical books, and then makes all this information available without consent in the contexts of its ads and paid search space, and is dismissal of anyone who objects to Google’s ultra liberal sense of copyright. "

robots.txt and noindex http://www.google.com/support/webmasters/bin/answer.py?hl=en...

And "Install the Google Toolbar and do a search of Bing, and Google actually directs your clickstream back for its own analysis"

Google/Matt Cutts have been very open about what they do and don't use that information for, and they aren't using click tracking.


Not really. If Google wants to publicly shame Microsoft it can and is. They're basically saying that Bing is the new "Let me Google that for you". Since they're not currently proceeding with legal motions it's basically just a smear campaign to make people think twice about Bing.

And there's truth to it.


Microsoft did nothing wrong. http://en.wikipedia.org/wiki/Deep_linking


Google makes money out of being the #1 search engine. Concurrence from others is one thing. Concurrence from someone stealing your source code (open source or not) is a whole different story. That should be reason enough, in my humble opinion.


Oh I obviously misunderstood the article. I'm so sorry.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: