As the article concluded, the difference was small (to me they're pretty much insignificant). It doesn't really matter to me if the target is the first link, as long as it's on the first page.
I think better comparisons would be who has the better infrastructure, who can deliver results faster, serve more queries, use less energy, and crawl faster? I think google is hard to beat here, and i think it's where the others have to think harder about.
To me, the most interesting aspect of the post was the comparison between Bing and the old Live. I'd have liked to see a Live-Google comparison and perhaps Yahoo! as well, since it's often thrown around as a comparable alternative.
This seems to confirm my idea that Bing is less a revamp to the search engine and more a rebranding for Microsoft. Bing is certainly more memorable than Live or MSN were, and it's replacing Live and other MS brands in several non-search areas (Virtual Earth -> Bing Maps for Enterprise, for example). That's certainly nothing new for Microsoft, but this time they seem to be marketing it as a search engine change to get people to try the engine and hopefully switch from the big G.
I think they realized after the Project Mojave (i.e., calling Vista not Vista made people like it at least a little bit more for 30 seconds) that branding things is kinda important. Of course, if the tech sucks, people will still hate it, but at least they can say they gave it a really half-hearted effort if anyone ever says they didn't try.
I thought this was an interesting study but, after spending a few minutes trying to find patterns in the data, I finally hacked up a quick null hypothesis graph in Excel and it looked virtually indistinguishable:
The shape of this kind of graph usually isn't very meaningful by itself for showing subtle patterns -- you're right that the extremes of the graph always look like that. In the original article, if you line up the Bing-vs-Google and Bing-vs-Live graphs, the intercept near the middle of the graph is a little further left. That's all I got from it. I'm assuming there's an ANOVA table associated with this study that we're not seeing, and the probabilities there must be a little more compelling.
Maybe a two-phase Turking might be interesting, too: first have the turkers fill come up with a few parameters for each query (e.g. current events, general info, celebrities, pr0n), then compare search engine results for those queries. That would help pick out the more subtle patterns that you were looking for in the original graph.
With only 6 to 8 participants, the study definitely didn't have the power to find all but the grossest differences between query types. It's not about study design so much as the need to recruit more people.
Hey - I left a post on your blog, but I don't agree with your assessment. Generating a similar shape with a random process doesn't imply that our data is has no signal, and if you actually lay out the graphs side by side you'll notice that our graph is somewhat shifted to the right.
I trust their statistical significance calculations... but at a glance, the distribution barely looks different from what I'd expect if the turkers picked at random (either on purpose or because search tastes are at some point arbitrary).
It'd be interesting to include such a graph, where all ratings are drawn at random (but with the same slightly-vs.-much proportions) for visual comparison.
Also: what would happen if all the individual queries on which the preference isn't statistically significant were discarded, or repeated until the preference becomes significant? (Or is "6-8 workers" enough for significance?)
You have a point there. Also, what would happen if they show them both lists with results from Google? They would probably get similar results. Same for Bing, same for Yahoo.
From the test site that compared results from G, B and Y a few days ago, I felt the results were so similar that I couldn't bring myself to click one as the best engine because I didn't think I was making an objective choice.
I think better comparisons would be who has the better infrastructure, who can deliver results faster, serve more queries, use less energy, and crawl faster? I think google is hard to beat here, and i think it's where the others have to think harder about.