>At that time amazing amounts of CPU and engineer time would be spent on verifying the quality of all algorithm and data changes, both during development and during launch. Changes that were unevaluated or were a net negative on quality would only be launched under very exceptional circumstances.
Maybe they should actually try using the product instead of relying on automated algorithms to generate metrics when they evaluate changes. For example, it's quite obvious that something is going wrong with the prioritisation of name place text for the UK at the moment: http://imgur.com/kL0GHfe (for the non-UK readers, no there is not a large important city called "Town Centre", Edinburgh is far larger than Kirkcaldy, Birmingham is the second largest city in the country and not labeled at all).
I'm quite curious about how much real human testing they actually do. I've always had the impression that testing by actual humans is the antithesis of Google culture (automate everything and reduce everything to comparable numerical metrics).
It's a good thing that I didn't say anything about "automated algorithms used to generate metrics", then...
This was machine-aided human evaluation.
Search quality evaluation can't really be done without humans in the loop. If you had an algorithm that could distinguish between a good result and a bad result, you wouldn't use it to evaluate results. You'd use it to generate the results.
Maybe they should actually try using the product instead of relying on automated algorithms to generate metrics when they evaluate changes. For example, it's quite obvious that something is going wrong with the prioritisation of name place text for the UK at the moment: http://imgur.com/kL0GHfe (for the non-UK readers, no there is not a large important city called "Town Centre", Edinburgh is far larger than Kirkcaldy, Birmingham is the second largest city in the country and not labeled at all).
I'm quite curious about how much real human testing they actually do. I've always had the impression that testing by actual humans is the antithesis of Google culture (automate everything and reduce everything to comparable numerical metrics).