You could also ask many humans and average their response to obtain a golden lab...

robrenaud on Nov 13, 2012 | parent | context | favorite | on: Large Scale Distributed Deep Networks (by Jeff Dea...

You could also ask many humans and average their response to obtain a golden label, and see how well any particular human agrees with the average. If there is a lot of variance in the human answers, then it's possible for a machine to have better than (individual) human performance, even on a human labelled data set.