When I was in the advertising business, one of the core products I created was a brandsafety product - basically preventing advertisers from advertising on dodgy sites.
I messed around with algorithms that detected nudity (because big brand advertisers don't want their ads showing up on porn sites). One of the more interesting and simple-to-use one is actually a simple averaging of the images across multiple samples. That one was easy to implement and has relatively good results.
In the end though, I ended up not using it because text clustering algorithms worked better in classifying content.
"The training set for the skin filter consisted of 1,182,608 manually labeled skin pixels and 10,471,553 manually labeled non-skin pixels while the testing set consisted of 2,303,824 manually labeled skin pixels and 24,285,952 manually labeled non-skin pixels."
This could only be a very rough first-pass on detection. Bathing suits can be very skimpy without being fully nude.
And social context plays a large role, for instance distinguishing between a fat male's nipples and a small-chested female's nipples would be impossible without analyzing a lot more than skin color.
Seems like a not very scalable approach to the problem. I would think if you wanted to capture all nudity (including monochromatic or illustrated), you would instead go at the problem from the angle of titillation. You could even round up images that are not necessarily human based (fruit arranged provocatively, for instance).
how do these nudity detection API work? Is there a crowdsourcing going underneath the hood? Are they using some clustering algorithm to detect a range of skin color (if 90%), it's nude.
To be fair, he may have tried that. The site is completely unusable on a phone as it seems to make assumptions about the minimum screen resolution and doesn't allow you to zoom out like most desktop sites on a mobile screen do. In fact it's one of the most poorly designed sites I've stumbled upon in a long time in that regard (which is a great pity as the content looked interesting)
Protip: If you film pornography in black and white, there is no such thing as "skin color."
If you remap the palette for a desaturated image, so that everyone's skin is green or magenta, are there any fewer penises penetrating vaginas in the image? If it's a horse and a fully clothed person's mouth, where is the algorithm for that?
How many pornography images do you see on the internet that are taken in black and white? How many are modified so that the people involved look like the hulk? An algorithm like this does not need to be 100% bulletproof in every situation imaginable, it just needs to work most of the time.
I messed around with algorithms that detected nudity (because big brand advertisers don't want their ads showing up on porn sites). One of the more interesting and simple-to-use one is actually a simple averaging of the images across multiple samples. That one was easy to implement and has relatively good results.
In the end though, I ended up not using it because text clustering algorithms worked better in classifying content.