The accuracy reported is top-5 accuracy, meaning that a model is considered corr...

dwiel · on Feb 10, 2015

Most of the examples cited in the paper that their algorithm got wrong, I would also get 'wrong.' I don't think I would guess restaurant for any of the images with that label, I might have gotten the middle spotlight and I might have gotten the first letter opener right, but I'm not sure.

How do you explain that their performance is better than human? Is it in the obscure examples?

greeneggs · on Feb 10, 2015

On the other hand, some of their correct images have questionable captions as well. For example, they labeled the geyser picture correctly, but their top labels also included "sandbar", "breakwater" and "leatherback turtle". A better scoring function, perhaps including hierarchies to account for the very vague "restaurant" photos and very specific dog breed photos, might be helpful. Otherwise, it seems like we might be overfitting to the peculiarities of this dataset.