> To put into perspective just how extraordinary a 90% accuracy claim is, consider that a well-controlled 2015 paper by computer vision researchers Gil Levi and Tal Hassner find that a convolutional neural net with the same architecture (AlexNet) is only able to guess the gender [5] of a face in a snapshot with 86.8% accuracy. [6]
This seems like a rather misleading comparison. I would be really surprised if CNNs, which can hit <5% on ImageNet out of 1000 classes, and whose GANs can generate nearly-photorealistic clearly gendered faces, can't even distinguish gender at least that well. And guess what happens when you click through to factcheck this claim that CNNs do worse on a binary gender prediction problem than guessing hundreds of categories? You see that the facial images used by Gil & Hassner are not remotely similar to a clean uniform government ID facial photograph dataset, as they are often extremely low quality, blurry, at many angles or lighting conditions, and I can't even confidently guess the gender on several of the samples at the beginning and end, because as they say:
> These show that many of the mistakes made by our system are due to extremely challenging viewing conditions of some of the Adience benchmark images. Most notable are mistakes caused by blur or low resolution and occlusions (particularly from heavy makeup). Gender estimation mistakes also frequently occur for images of babies or very young children where obvious gender attributes are not yet visible.
It wouldn't surprise me, going off the samples, if human-level performance was closer to 86% than 100%, simply due to the noise in the dataset.
(It's also bizarre to make this claim shortly after presenting ChronoNet! So which is it: are CNNs so powerful learning algorithms that they can detect the subtlest biases and details in images to the extent of easily classifying photographs of random scenes to within years of their manufacture and so none of their results are ever trustworthy, or are they so weak and dumb that they cannot even distinguish male vs female faces and so none of their results are ever trustworthy? You can't have it both ways.)
> It wouldn't surprise me, going off the samples, if human-level performance was closer to 86% than 100%, simply due to the noise in the dataset.
I think that's kind of the point! The idea that any agent -- human or machine -- can know someone's gender, criminal convictions, or anything else about their background just from a photograph is fundamentally flawed.
No, my point is that you can't use a hard dataset to say what is possible on an easy dataset. 'Here is a dataset of images processed into static: a CNN gets 50% on gender; QED, detecting criminality, personality, gender, or anything else is impossible'. This is obviously fallacious, yet it is what OP is doing.
> The idea that any agent -- human or machine -- can know someone's gender, criminal convictions, or anything else about their background just from a photograph is fundamentally flawed.
This is quite absurd. You think you can't know something about someone's gender from a photograph? Wow.
Personally, I find it entirely possible that criminality could be predicted at above-chance levels based on photographs. Humans are not Cartesian machines, we are biological beings. Violent and antisocial behavior is heritable, detectable in GWAS, and has been linked to many biological traits such as gender, age, and testosterone - hey, you know what else testosterone affects? A lot of things, including facial appearance. Hm...
Of course, maybe it can't be. But it's going to take more than some canned history about phrenology, and misleadingly cited ML research, to convince me that it can't and the original paper was wrong.
> This is quite absurd. You think you can't know something about someone's gender from a photograph? Wow.
No, I'm saying that neither humans nor machines can determine gender solely by looking at a picture, no matter how well they're trained. There will always be examples they get wrong. The problem is not that the machines aren't as good as humans. The problem is that they're both trying to do something that's impossible.
And predicting at "above-chance levels" isn't enough. The article goes into great detail about how this kind of inaccurate prediction can cause real human suffering.
> No, I'm saying that neither humans nor machines can determine gender solely by looking at a picture, no matter how well they're trained. There will always be examples they get wrong.
This is irrelevant and dishonest. Don't go around making factual claims like something can't be done when it manifestly can usually be done.
We can't know for sure, everyone agrees. But so what? It's still very interesting and potentially useful (or dangerous, depending on your point of view) to learn about correlations.
This seems like a rather misleading comparison. I would be really surprised if CNNs, which can hit <5% on ImageNet out of 1000 classes, and whose GANs can generate nearly-photorealistic clearly gendered faces, can't even distinguish gender at least that well. And guess what happens when you click through to factcheck this claim that CNNs do worse on a binary gender prediction problem than guessing hundreds of categories? You see that the facial images used by Gil & Hassner are not remotely similar to a clean uniform government ID facial photograph dataset, as they are often extremely low quality, blurry, at many angles or lighting conditions, and I can't even confidently guess the gender on several of the samples at the beginning and end, because as they say:
> These show that many of the mistakes made by our system are due to extremely challenging viewing conditions of some of the Adience benchmark images. Most notable are mistakes caused by blur or low resolution and occlusions (particularly from heavy makeup). Gender estimation mistakes also frequently occur for images of babies or very young children where obvious gender attributes are not yet visible.
It wouldn't surprise me, going off the samples, if human-level performance was closer to 86% than 100%, simply due to the noise in the dataset.
(It's also bizarre to make this claim shortly after presenting ChronoNet! So which is it: are CNNs so powerful learning algorithms that they can detect the subtlest biases and details in images to the extent of easily classifying photographs of random scenes to within years of their manufacture and so none of their results are ever trustworthy, or are they so weak and dumb that they cannot even distinguish male vs female faces and so none of their results are ever trustworthy? You can't have it both ways.)