Hacker News new | past | comments | ask | show | jobs | submit login
What is the best way to programatically detect porn images? (2009) (stackoverflow.com)
142 points by romain_g on Oct 16, 2013 | hide | past | favorite | 104 comments



I used pifilter (now WeSEE:Filter , http://corp.wesee.com/products/filter/) for a production, realtime, anonymous confession site (imeveryone.com) in 2010.

It cost, IIRC, a tenth of a cent per image URL. Rather than being based on skin tone, it was created based on algos to specifically identify labia, anuses, penises, etc. REST API: send a URL, get back a yes/no/maybe. You decided what to do with the maybes.

My experience:

- Before launch, I tested it with 4chan b as a feed, and was able to produce a mostly clean version of b with the exception of cartoon imagery.

- It could catch most of the stuff people tried to post to the site. Small breasted women (being that breasts are considered 'adult' in the US) was the only thing that would get through and wasn't a huge concern. Completely unmaintained public hair (as revealing as a black bikini) would also get through.

- Since people didn't know what I was testing with they didn't work around it (so nobody tried posting drawings or cartoons), but I imagine eg a photo of a prolapse might not trigger the anus detection as the shape would be too different.

- pifilter erred on the side of false negative, but one notable false positive: a pastrami sandwich.


I don't remember where I read this, but someone once recommended having a bot that posts the image to 4chan b feed and monitor the views and replies. Since porn images on b usually had somewhat predictable replies, it would be a good way to augment another technique and prevent false positives. Funny idea, but not really sure how viable.


Using /b/ as an unpaid mechanical Turk alternative sounds intriguing yet profoundly disturbing.

Imagine being able to fire off a dozen /b/tards bots at the website or target of your choice.


> Small breasted women (being that breasts are considered 'adult' in the US) was the only thing that would get through and wasn't a huge concern.

But that could also be developing breasts on a youth, and that would mean the image is something you very much want to block and report.


"Completely unmaintained public hair". And now i'm wondering if we could disrupt the market with a Public Hair as a Service platform.


In The Olden Pre-Digital Days porn was either in print or on a television screen. Back then (we are talking two whole decades ago) experienced broadcast engineers could instantly spot porn just by catching a look at an oscilloscope (of which there were usually many in a machine room).

Notionally the oscilloscope would be there to show that the luminance and chroma was okay in the signal (i.e. it could be broadcast over the airwaves to look as intended at the other end - PAL/NTSC), however, porn and anything likely to be porn had a distinctive pattern on the oscilloscope screen. Should porn be suspected then the source material would obviously be patched through to a monitor 'just in case'.

Note that the oscilloscope was analog and that the image would be changing 25/30 times a second. Also, back then there were not so many false positives on broadcast TV, e.g. pop videos etc. where today's audience deems them artful rather than porn.

If I had to solve the problem programatically I would find a retired broadcast engineer and start from there, with what can be learned from a 'scope.


It's probably related to the type of cameras available in the day rather than anything else...


I have developed an algorithm to detect such images, based on several articles published by research teams all over the world (it's incredible to see how many teams have tried to solve this problem!).

I found out that no single technique works great. If you want an efficient algorithm, you probably have to blend different ideas and compute a "nudity score" for each image. That's at least what I do.

I'd be happy to discuss how it works. Here are a few techniques used:

- color recognition (as discussed in other comments)

- haar-wavelets to detect specific shapes (that's what Facebook and others use to detect faces for example)

- texture recognition (skin and wood may have the same colors but not the same texture)

- shape/contour recognition (machine learning of course)

- matching with a growing database of NSFW images

The algorithm is open for test here: http://sightengine.com It works OK right now but once version 2 is out it should really be great.


Amazon Mechanical Turk has an adult-content marker specifically for this purpose. Lots of people have done the paperwork to qualify for adult-content jobs and the cost of having humans do it at scale is very low: https://requester.mturk.com/help/faq#can_explicit_offensive

Source: I helped implement a MT job to filter adult content for a large hosting company.


So i can literally get paid for looking at porn. Huh, who knew... :)


At the cost of your mental health, sure.

I recall reading an article about the human workers who had this job at Google... they had crap benefits, crap pay, and no mental healthcare. As a result a lot of the people in the field had depression and other mental issues.


Friend was a contractor on the YouTube filter team -- they lasted about 9 months before it became too much and had to leave.

There was absolutely no reward in showing up to work, and some of the things they saw have likely scarred their memories forever


Challenge accepted!



I signed up as a worker on MT a few years ago just for kicks. Let's just say that labeling adult images got old real fast.

Only interesting tasks were the psychology ones, but they got boring too (think Daniel Kahnemans experiments repeated ad nauseum).

How people last on MT longer than a month is something that I


This quora answer describes one such person in the industry watching porn all day and writing most enticing copy for it.

http://qr.ae/NSSa0


And RSI as well I would imagine.


Source?.... or any hints as to what the article was called. Sounds interesting.


Isn't Obamacare supposed to cover mental health care? Seems like a solved problem.


Holy hell people, touchy aren't we.


I did this for my bachelor thesis for a company that shall remain unnamed. I am pretty confident that my approach works better than any of the answer posted on stackoverflow.

I used the so called Bag of Visual Words approach. At that time the state of the art in image recognition (now it's neural networks). You can read about on Wikipeida. The only main change from the standard approach (SHIFT + k-means + histograms + SVM + chi2 kernel) was that I used a version of SHIFT that uses color features. In addition to this I used a second machine learning classifier based on the context of the picture. Who posted it? Is it a new user? What are the words in the title? How many view does the picture have....

In combination the two classifiers worked nearly flawless.

Shortly after that, chat roulette has having it's porn problem and it was in the media that the founder was working on a porn filter. I send an email to offer my help, but didn't get an reaction.


SHIFT or SIFT? What's SHIFT?


I think he means SIFT. I'm not aware of SHIFT either, and a look at his PAMI paper shows use of color features as well as various SIFT features.


This sounds quite interesting. Is there any of the research of code base that you can share? Or otherwise any references about the standard approach which you would recommend?


This software is free for non-commerical use: http://koen.me/research/colordescriptors/

You can find other implementation of varying quality if you Google for Bag of Visual Words. For the final classification, I would recommend scikit-learn.


It would be interesting to see the images that would be generated if you took that system and "ran in backwards", insofar as that's possible.


This is probably going to get downvoted, but if lots of people are not overzealous puritans and want some skin, the best overall system design that maximizes happiness and profit is probably sharding into

puritanweirdos.example.com with no skin showing between toes and top of turtleneck (edited to add no pokies either)

and

normalpeople.example.com with 99% of the human race

The best solution to a problem involving computers is sometimes computer related, but sometimes is social. The puritans are never going to get along with the normal people anyway, so its not like sharding them is going to hurt.

Another way to hack the system is not to hire or accept holier than thou puritans. Personality doesn't mesh with the team, doesn't fit culture, etc. You have to draw the line somewhere, and weirdos on either end should get cut, so no CP or animals at one extreme, and no holy rollers on the other extreme.

The final social hack is its kind of like dealing with bullies via appeasement. So they're blocking reasonable stuff today, tomorrow they want to block all women not wearing burkhas or depictions of women damaging their ovaries by driving. Appeasing bullies never really works in the long run, so why bother starting. "If you claim not to like it, or at least enjoy telling everyone else repeatedly how you claim not to like it, stop looking at it so much, case closed"


Porn detection still has its uses and you're making the mistake of saying that only puritans are interested in porn detection.

For example if you've got children, given your stance on the matter, you may not necessarily agree that a filter is necessary, but how about being alerted when your children are viewing obscene content? How about being alerted when your children are engaging in sexting?

When it comes to children, maintaining their purity is only one side of the coin, a necessity with which not all people necessarily agree. But the other side of the coin that's a pretty objective fact is that children do get the wrong ideas about what they see and sometimes it happens with adults too, with porn being the main reason why men think they need big penises to satisfy women. And there's a lot of weird porn out there. With improper exposure, a child can end up growing with certain ideas about doing sex, with certain complexes and so on.

And I'm not necessarily for censoring that content, as children can find ways around the censorship should they want to, plus these filters aren't perfect anyway. But I would find useful a system that alerts me when my child gets exposed to porn, such that I can take appropriate measures, like having fatherly talks about sex, explaining to him that what he just saw is a really bad idea in case he looked at something weird.

Plus, exposure to porn can happen 100% by accident and that's my personal problem with it. I take my monthly dosage for appealing to my stockings fetish, but you know, I like to be in control of when that happens. Like going to a website and clicking on something can trigger a popup involving ads to either poker games or porn. Sometimes they've got sound too. Imagine hearing in the workplace the sound of a woman's moan. It's totally disrespectful to your colleagues, as it disrupts their workflow. I was searching for something on ThePirateBay once and it happened to me.


children do get the wrong ideas about what they see and sometimes it happens with adults too, with porn being the main reason why men think they need big penises to satisfy women

Is it porn that gives them the wrong idea, or is it because porn is the only concept of sex that they get exposed to?

But I would find useful a system that alerts me when my child gets exposed to porn, such that I can take appropriate measures, like having fatherly talks about sex, explaining to him that what he just saw is a really bad idea in case he looked at something weird.

Why wouldn't you have fatherly talks about sex regardless of exposure to porn?

Imagine hearing in the workplace the sound of a woman's moan. It's totally disrespectful to your colleagues, as it disrupts their workflow.

Playing any sound is disrespectful to your colleagues, by breaking their concentration. The pornographic part is irrelevant.


> Is it porn that gives them the wrong idea, or is it because porn is the only concept of sex that they get exposed to?

My wife works at a kindergarten and she has first hand experience with 4-year olds being exposed to porn and the effects really aren't nice.

What do you propose actually? Live demos or experiments under adult supervision?

> Why wouldn't you have fatherly talks about sex regardless of exposure to porn?

Because there is no proper way to bring up people having sex with horses and dogs into a conversation with a child, let alone that the act of making love doesn't necessarily involve slapping a woman and forcefully shoving your dick inside her mouth, as not all women like that.

You either missed my point or you don't have children of your own, but really, I'd love to get your idea of a conversation with a six year old.

> Playing any sound is disrespectful to your colleagues, by breaking their concentration. The pornographic part is irrelevant

Are you seriously implying that all sounds are created equal, especially given that some people end up behaving like children when hearing sounds related to sex?

Do you also fart in elevators? When it's hot outside, do you also come butt-naked at your office? When you get horny, do you make announcements?


Well, I was certainly not thinking about 4 and 6 year olds; in general, I don't see how would they get exposition to porn, unless they were left unattended in front of a web-connected device, and in that case porn wouldn't be my main concern (as difficult as it sounds, I'd rather explain bestiality than suicide bombings). Sorry for the misunderstanding.

I was thinking more about 8-10 year olds, to whom there's plenty of ways to introduce the concept of sex, not only through conversation but also vetted media (e.g. books, films).

I don't have kids, no, but was involved in the education of my (much) younger brother, and so in a privileged position to discuss this with him (compared to his parents), and he has grown up knowing about sex way before discovering porn.

Are you seriously implying that all sounds are created equal, especially given that some people end up behaving like children when hearing sounds related to sex?

I'm implying that a porn filter wouldn't solve the real problem. As for people behaving like children when hearing sounds related to sex, I have to say I don't know any above 20 years old.

Do you also fart in elevators? When it's hot outside, do you also come butt-naked at your office? When you get horny, do you make announcements?

No. I also try not to make silly comparisons on online discussions.


People seem not to realise that children take only what they can understand. If you show a 6 year old violent porn, they won't be traumatised by the porn but by the violence. If you show a 6 year old "normal" porn they'll ask lots of questions that you'll find more or less awkward depending on how puritan you are.

The conversation with the 6 year old is easy. It's the conversation with the 13 year old that's the problem. Liberals have "the talk" early. Right-wingers have "the talk" late. Very few people have talk regularly as the child grows up.

I'm not sure what your last sentence has to do with the argument.


"My wife works at a kindergarten and she has first hand experience with 4-year olds being exposed to porn and the effects really aren't nice."

I think this is made up, or at most the kids are reacting to the teacher freaking out or trying to tease the teacher knowing its a great way to freak her out.

It is comical to think back at what I thought of sex-ploitive-ish TV when I was a kid, pre-puberty. Now yes I know this wasn't "XXX chicks with happy horses" or whatever, merely broadcast TV, but I don't think that would change the reaction very much.

The women on Charlies Angels? Eh whatever, the A-Team was a better action show although even for a kid, a little formulaic.

The actresses on Laverne and Shirley don't wear bras? Who cares they look like goofballs wearing 50s clothes this is 1980 lets watch battlestar galactica (the original).

The dukes of hazard was all about the car chases, I didn't even notice Daisy's attire (or lack thereof) until my hormones kicked in and suddenly she was very interesting indeed, like from zero interest to 1000+ in what seemed like weeks. When I was young I thought it was a stupid show, like a lame version of Knight Rider which I much preferred.

So the happy couple is reunited and their relationship rekindled on the love boat. Well good for them but I have no interest in watching them smooch and grope each other for thirty seconds on camera. Heck I'd switch to Mr Belvedere if it was on.

Baywatch, seriously, that is the dumbest show ever if you're under 13 or so. Why is the intro this woman running along the beach in slow motion, if she's rescuing people she should be swiming into the ocean not running along it, even a dumb kid knows that.

Once the hormones kick in all a filter does is get in the way of whats suddenly tremendously interesting.

I could see it being awkward for a 7th grade teacher where suddenly some of the boys instantly find Miley Cyrus (or whoever) completely fascinating and the rest are all "eh, girl music, who cares". But at grade-K I'm thinking the reaction is going to be pretty minor most likely "eww gross" but mostly "so what".

I would worry more about gore and gore-shock sites, which is getting pretty far off topic.

So I'm a geezer, so what. So we'll try something more contemporary. How many kids actually required therapy after the famous janet jackson wardrobe malfunction during the superbowl. If they needed therapy it was probably for the agonizing awkward experience of watching what 60 year old bald white guys in NYC think suburban teen youth of america think urban rap stars think is cool and more or less screwing it up hilariously.


I'm actually annoyed that you got modded down. Your post brings up an excellent point although I don't think it's intended.

These days, it comes down to exposure and access. Back when I was a teen (early 90's) we didn't have the internet as it is. Sure there were BBSs but that was about as convenient as trying to get a dirty magazine. While I may had idle curiosity when I was pre-teen, I had zero access to it (my father didn't have any magazines, etc) and therefore it would've taken way too much effort to see it, so it was easily dismissed. In context, it was equally difficult to even see an R rated movie.

These days, my 8 or 10 year old self could be exposed to literally anything within a couple seconds. It's free and prevalent, and can show up in unexpected places. I don't have kids (will soon) but I don't look forward to dealing with those situations.

As for Janet Jackson, I agree that was overblown. If anyone needed therapy for that it was because their parents were overbearing. However, we aren't talking about a bare breast here, we are talking about simulated rape, bestiality, and some other things that just aren't my thing. But, hey, if they are someone's thing that's fine - let's just try to keep that stuff out of the hands of the kids until they are old enough to reason for themselves about what's really going on.


Harassment laws/lawyers would find you their wet dream. The rule with pictures in any work environment is to err on the side of extreme caution.

More than likely your not going to find out they don't mesh with the team till they quit or are fired. If your lucky its not followed by a lawsuit.

See, its not their job to not be offended, it is your job to offer a harassment free work environment. Whom you are catering too depends on what is PC or not at the time, fortunately its pretty easy to determine whose whims to cater to or not


I mean, if you read between the lines from his class notes, this is basically what Peter Thiel says: Just don't hire these people. Nobody will say it publicly, because that opens them up to discrimination lawsuits, but that's the reality on the ground.


I'm not sure it is that easy, even HN discussions on the topic don't come to any particular conclusion and that is within a fairly narrow range of people.

In an old job I had the task of cleaning spam off a forum, this meant looking at rather a lot of porn but it would never have occurred to me to sue on that basis, yet that is still a job that needs to be done.


Nope. Harrassment is a legal term of art, not a synonym for disagreeable treatment. Harrassment occurs when a person or group is singled out for distinct treatment, in the U.S. at least. Subjecting everybody to all types of porn is quite lawful. (There is a famous case of a bisexual pervert who indiscriminantly propositioned anything with a pulse. His self-designated victims were bitch slapped out of court.)


What? No.

Sexual harassment is unwelcome sexual advances, requests for sexual favors, and other verbal or physical harassment of a sexual nature, no matter how many people are sexually harassed or the gender of the parties involved.

There have been class action sexual harassment lawsuits. Jenson v. Eveleth Taconite Co. was the first which represented fifteen women.

http://en.wikipedia.org/wiki/Jenson_v._Eveleth_Taconite_Co.

It's a fuzzy ground with say prominent displays of pornographic material in say an office environment. One might argue that it can make the workplace a hostile or overtly sexually charged environment. It is best to err on the side of caution.


Good lord I've read that case and its about as far from the pr0n filter example as you can possibly get. That place was beyond nuts and so was the trial. They deserved every penny they got because the workplace was out of control crazy, and merely coincidentally happened to involve male/female issues.

That is NOT a case about accidentally clicking on a web page or debating if huffpo headline stories are sometimes a little too racy, that was a case about a madhouse of stalking and intimidation and tire slashing and... I don't think a mere web pr0n image filter would have saved the mine 3.5 million dollars. Actually having capable supervisors and managers, yeah, that might have worked?


I should have been more specific. The defense "I sexually harass everyone, so it isn't sexual harassment" is just not in any way valid. Full stop. Period.

I used the first class action sexual harassment lawsuit as an example to illustrate the point of "just because it's company culture/I do it to everyone/he's just like that" doesn't make it not harassment.

The GP was giving false information that if you sexually harass everyone, than it magically becomes not sexual harassment and then it somehow gets thrown out in court, but does not cite the relevant court case. Just because you ass grab everyone that walks by doesn't make ass grabbing not sexual harassment by the law.

That's mostly my point.

As far as a p0rn filter, that's a different story. You can look at it a different way, legally.

I'm just playing devil's advocate here as to why it would be a good idea for someone to try to err on the side of caution in the workplace filter rules.

They may want to avoid the situation where person A is sitting in their cube while person B is watching hardcore gay porn next door specifically to harass person A. Person A might feel that the employee didn't do a good enough job to prevent this type of harassment by blocking access to that content.

If the employer doesn't try to filter out porn or at least create a policy against it, it might look like it is condoning or supporting it whereas their peers filter. (I just made that up, if say 85% of their peers make an attempt to filter) especially if it is well known that managers tolerate risque images/video if not outright porn, or if company culture tolerates it.

A less grey example is if you put a picture of two women who are mostly clothed but kissing and put it in the office breakroom. Everyone sees it, but it creates a sexually charged environment that can be hostile to some. That's a big no-no in HR land.


One important point to make about Taconite and your examples are I cannot comprehend how any of it could remotely relate to work; management needs to write them up and fire them not because the victims were women or it somehow had something to do with sex but because the workplace is apparently completely out of control. This isn't some pie in the sky daydream, this is decades of observation at a variety of F500 companies in the midwest.

The problem with some jerk harassing female coworkers with .. full stop ... harassing coworkers comma, is he's not doing his job, and he's preventing others from doing their job, and the only mystery is why that problem is not being taken care of at that level. Its not pass the buck time, why didn't some dude in another department find some hairbrained technical scheme to try to temporarily stop the nutcase from bothering people... he's got a boss who's already responsible for that task.

Is this one of those weird coastie vs heartland type issues where we have real bosses so coastie stories sound weird, or ...

That's the crazy part of the story. If the boss simply doesn't care, then blocking certain pix on port 80 is merely going to result in your example of hanging up pictures, or carrying in zip drives, or whatever, obviously the boss won't care about those either. So the net result is a huge waste of time and money for ... nothing?

I would think there's some CYA going on if you know there's about to be another 3.5 million dollar "taconite" type lawsuit filed, and its going to be a slam dunk, whoever it is ordering hairbrained technical solutions doomed to failure probably won't be punished as hard as the guy doing the harassment or his boss who doesn't care, or his bosses boss. But a legal hit like that means tune up the resume anyway, its not looking good.


But I haven't seen many actual case law writeups published by proper legal authorities that proves a lot of this over zealous activity by HR actually saves the company anything.

I would bet a case of Moet that most of the Hyperbolic HR warnings over this stuff actuality have no reasonable chance of ever causing a loss.


Oh dear looks like I pissed off the Puritan nutters or we have shudder induhviduals from human remains reading HN.

I will say it again in my experience I have yet to see a credible case written up in reputable legal publications that detail any cases where a pron filter or the lack off one made any difference to a case.

Any one care to post a link to prove me wrong rather than than moding me down?


Your last and second last points seem to contradict each other. You claim that people who are into animal sex are "weirdos" and we should draw the line there, then you claim that people who want to block something are bullies who should not be appeased.

Also your claim that all people who wants to block pornographic images are really bullies who will not stop until all women are in burkhas is stupid in itself.


Here's an idea...

Develop a bot to trawl NSFW sites and hash each image (combined with the 'skin detecting' algorithms detailed previously). Then compare the user uploaded image hash with those in the NSFW database.

This technique relies on the assumption that NSFW images that are spammed onto social media sites will use images that already exist on NSFW sites (or are very similar to). Then it simply becomes a case of pattern recognition, much like SoundHound for audio, or Google Image search.

It wouldn't reliably detect 'original' NSFW material, but given enough cock shots as source material, it could probably find a common pattern over time.

edit: I've just noticed rfusca in the OP suggests a similar method


"This multi-TB disk array labeled 'Porn' has a legitimate business use!"


How do you program that sort of thing?

Do you have to tell it what shapes/colors to look for? Or do a combination of overall image similar combined with localized image similarity and portion by portion image comparison?


Maybe recognising the furniture in the background would work too ;) I remember there was a website/catalog of IKEA furniture somewhere made using NSFW photos.


Well Image Hashing is distinct from normal MD5 hashing as the hash does consider similarity of colour, etc. so it's not purely binary. A Google search produced a library called pHash.org that might do something similar.


Is it possible to hash an image so that you can partially match it with subsets of that image (like cropped regions or resizes)? Or a slight modification of that image (colors shifted, image flipped, etc).


I believe so. Tineye uses something similar, because it detects those matches.


Yep. Google "perceptual hash functions".


shameless self-promotion - I wrote a perceptual hasher for PHP : https://github.com/kennethrapp/phasher

FWIW the biggest problem with this is false positives, though admittedly I may just not be clever enough to do it with enough finesse.


detecting all porn seems to be an almost impossible problem. Many kinds of advanced porn (BDSM, etc.) don't have much skin - often the actors are in latex, tied up, or whatever. It's obviously porn when you see it, but detecting it seems incredibly hard.

Detecting smurf-porn(1) (yes that's a thing...) is even harder since all the actors are blue.

http://pinporngifs.blogspot.dk/2012/09/smurfs-porn.html?zx=7... - obviously very NSFW, but quite funny.


It is possible high accuracy if you use machine learning and a sufficiently large training set. That said said even humans sometimes don't agree is something is porn or not.


Just saying "machine learning" is not very useful here. What machine learning techniques work well in this case and what are the major pitfalls?

Then you can convince me that a "sufficiently large training set" exists and is smaller than "all the images on the internet".


See my other post on this page.

I would argue that this less difficult than you average image classification problem. Just have a look at what kind of challenges image classification can tackle, picking the correct class out of 1000s of classes. Porn is normally well-lit, the subject is at the center of the image, etc.

The main difficulty is to define what is porn and what is not ... It's easy see the difference between porn and pictures of bicycles. But how about porn and artistic nudity? You see it's actually a scale, but you are trying to make a binary decision.

Another problem (at least with the method I explained below) is that portraits sometimes get misclassified. Maybe it could help to integrate face detection. I'd suspect that more recent models would not have this problem (e.g. ones that take not only local features into account). Other times it makes mistakes, where you think "why on earth would think this is porn". Again combing different method should help to eliminate those outliers.

Outliers are also a problem, e.g. black and white pictures. Again an ensemble of different models (e.g. one color independent one) might help. Niches are not really a big problem. BSDM porn is as far as I have seen the only niche of porn that really different visually.


Couldn't help but read your second line as talking about the smurf-porn(1) man page. :/

man 1 smurf-porn ?!?!



To this day, I believe the best method for picking out these images is a human censor (with appropriate, company provided, counseling afterward).

Edit: No shortage of stock image reviewer jobs https://google.com/search?hl=en&q=%22image%20reviewer%22

I'm trying to find an interview of one of these people describing what it's like on the other end. It wasn't a pleasant story. These folks are employed by the likes of Facebook, Photobucket etc... Most are outsourced, obviously, and they all have very high turnover.


I seem to remember reading an article about people doing this at Google(?) in a pretty poor state.

Edit: I think it was this one: http://www.buzzfeed.com/reyhan/tech-confessional-the-googler...


Yep! That's the one. Thanks.

A pretty unpleasant job no matter what angle you look at it.

I remember this bit the most : "Google covered one session with a government-appointed therapist — and encouraged me to go out and get my own therapy after I left."

This is the down side to 'abstracting away' the dirty end of filtering. I'm looking forward to a day when this can be properly automated, but, considering the ever-changing nature of erotica to begin with, I don't see that happening any time soon.


Developing a strong AI which can do this without going insane itself is going to be the robo-pysche challenge of the future.


Nobody has discussed i18n and l10n issues? What passes for pr0n in SF is a bit different than tx.us and thats different from .eu and from .sa (sa is saudi arabia not south africa, although they've probably got some interesting cultural norms too)

If you're trying for "must not offend any human being on the planet" then you've got an AI problem that exceeds even my own human intelligence problem to figure out. Especially when it extends past pr0n and into stuff like satire, is that just some dudes weird self portrait, or a satire of the prophet, and are you qualified to figure it out?


How about a picture of a woman's breasts? What about an erect penis? Sounds like porn, but you might also see these things in the context of health-related pictures or some other educational material.

The classic problem of trying to filter pornography is trying to separate it from information about human bodies. I suspect that doing this with images will be even harder than doing it with text.


Definitely true. Facebook had a dust-up when a woman posted a topless photo of herself after she had had a double mastectomy.

That said, not all sites are like Facebook and we aren't talking about filtering all the images on the internet, just ones on specific sites. One example I can think of is that a forum for a sports team might not want NSFW pictures posted as it would be irrelevant.


Google reverse image search can come up with a search likely to return the given image. Perhaps this can be used for porn classification.


Seems like we were having this same problem with email spam, and Bayesian-based learning filters revolutionized the spam filtering landscape. Has anyone tried throwing computer learning at this problem?

We as humans can readily classify images into three vague categories: clean, questionable, and pornographic. The problem of classification is not only one of determining which bucket an image falls into but also one of determining where the boundaries between buckets are. Is a topless woman pornographic? A topless man? A painting of a topless woman created centuries ago by a well-recognized artist? A painting of a topless woman done yesterday by a relatively unknown artist? An infant being bathed? A woman breastfeeding her baby? Reasonable people may disagree on which bucket these examples fall in.

So what if I create three filter sets: restrictive, moderate, and permissive, and then categorize 1,000 sample images as one of those three categories for each filter set (restrictive could be equal to moderate but filter questionable images as well as pornographic ones).

Assuming that the learning algorithm was programmed to look at a sufficiently large number of image attributes, this approach should easily be capable of creating the most robust (and learning!) filter to date.

Has anyone done this?


This was my first thought. With a good training set and a savvy algo I believe machine learning can be good with images, and theres an unprecedented amount of training sets out there to be scraped...


Everyone is focusing on the machine vision problem but the OP had a good idea:

>There are already a few image based search engines as well as face recognition stuff available so I am assuming it wouldn't be rocket science and it could be done.

Just do a reverse image search for the image, see if it comes up on any porn sites or is associated with porn words.


Relevant:

http://en.wikipedia.org/wiki/I_know_it_when_I_see_it

Basically, it's impossible to completely accurately identify pornography without a human actor in the mix, due to the subjectivity... and especially considering that not all nudity is pornographic.


This is a classic categorical problem for machine learning. I'm surprised so many suggestions have involved formulating some sort of clever algorithm like skin detection, colors, etc. You could certainly use one of those for a baseline, but I'd bet machine learning would out-score most human-derived algorithms.

Take a look at the scores for classifying dogs vs cats with 97% accuracy http://www.kaggle.com/c/dogs-vs-cats/leaderboard. You could use a technique of digitizing the image pixels and feeding to a learning algorithm, similar to http://www.primaryobjects.com/CMS/Article154.aspx.


I am aware of some nice scholarly work in this space. You may find Shih et al. approach of particular interest [0]. Their approach is very straight forward and based on image retrieval. They have also reported an accuracy of 99.54% for Adult image detection in their dataset.

[0] Shih, J. L., Lee, C. H., & Yang, C. S. (2007). An adult image identification system employing image retrieval technique. Pattern Recognition Letters, 28(16), 2367-2374. Chicago

http://sjl.csie.chu.edu.tw/sjl/albums/userpics/10001/An_adul...


I came across nude.js (http://www.patrick-wied.at/static/nudejs/) when researching for a social network project, seems quite nice and is Javascript based.


Wouldn't testing for skin colors produce far too many false positives to be useful? All these beach photos, fashion lingerie photos, even close portraits. And how about half of music stars these days who seem to try to never get caught more clothed than half naked?

Nudity != porn and certainly half-nudity != porn.

I'd rather go for pattern recognition. There's lot of image recognition software these days that can distinguish the Eiffel Tower from the Statue of Liberty and it might be useful to detect certain body parts and certain body configurations (for these shots that don't contain any private body part but there are two bodies in an unambiguous configuration).


"detect certain body parts"

When I was a kid, we had a firewall at school that tried to filter pornography by doing something similar with text. Doing research on breast cancer turned out to be rather tricky.

So let's say you try to detect certain body parts. Now you have someone who wants to know more about their body, but you are classifying images from medical / health articles as pornography.

"certain body configurations"

So now instead of having trouble reading about my own body, I will have trouble looking at certain martial arts photos:

https://upload.wikimedia.org/wikipedia/commons/1/14/BostonKe...

I am not saying these are unsolvable problems, but they are certainly hard problems. Even using humans to filter images tends to result in outrageous false positives sometimes:

http://abcnews.go.com/Health/breastfeeding-advocates-hold-fa...


You are correct and I'm not saying that I described the Holy Grail of detecting porn in a single paragraph. I'm just pointing to another direction. No solution to a very complex problem could be one-dimensional. Combining several different tests might lead to a solution. I.e. these jujitsu photos should not even be detected as “certain body configurations” as people there are fully clothed and there's not much actual bodies seen in the picture (so mentioned skin color definitely should come to play when detecting wether you see a body or not).

At the end of the day I doubt there could be a fully bulletproof and always correct solution using current state of tech. But you need to factor much more than just skin color if you try to build an automated solution to this problem.


There's plenty of fully clothed porn out there. They're even tags on some porn sites.


When I was in a boarding college we had wardrobe doors that were the perfect colour to set off the skin-tone filters.


Whilst I agree that programmatically eliminating porn images is a very hard problem. Programmatically filtering porn websites might be easier, beyond just a simple key word search and whitelist.

If you assume that porn tends to cluster, rather than exist in isolation, then a crawl of other images on the source pages , applying computer vision techniques, should allow you to block pages that score above a threshold number of positive results (thus accounting for inaccuracy and false positives).


"score above a threshold number of positive results"

How about social scoring? A normal (or even a weirdo) teenage boy would spend less than a second examining my ugly old profile pix, but after ten or so of your known teen male users are detected to spend 5 minutes at a time, a couple times a day, closely studying a suspected profile pix, I think you can safely conclude that pix is not a pix of me and then flag / censor / whatever it for the next 10K users.


The graph theorist in me is rubbing my hands in glee at the thought of seeing if you could extrapolate out that approach to catch a broader range of offensive imagery through relationships and usage patterns.


Key problem: profile pictures are everywhere on a website by definition. People look at them for over an hour at a time depending how your site is laid out.

If you wanted to spot shock imagery it's easier - study navigation aways or rapid scrolling.


You can use APIs like these to do nude detection - https://www.mashape.com/search?query=nude


If you're interested in Machine Learning, the outstanding Coursera course on machine learning just started a couple of days ago. It covers a variety of machine learning topics, including image recognition. The first assignment isn't due for a couple of weeks, so it's a perfect time to jump in and take the machine learning course!

https://www.coursera.org/course/ml


Algorithmic solutions will always be hard. "I know it when I see it" is hard to program.

Depending on the site, I'd go to a trust-based solution. New users get their images approved by a human censor (pr0n == spambot in most cases). Established users can add images without approval.

If you're going to try software, try something that errs on the side of caution, and send everything to a human for final decision-making, just like spam filters.


"You can programatically detect skin tones - and porn images tend to have a lot of skin. This will create false positives but if this is a problem you can pass images so detected through actual moderation. This not only greatly reduces the the work for moderators but also gives you lots of free porn. It's win-win."

hilarious!


Pornography is so creative that I find it hard to have one algorithm that can detect it all. Looking for features certainly wouldn't catch the more weird stuff.

Maybe a good approach is an image lookup, trying to find the image on the web and seeing if it appears on a porn site, or a pornographic context.


It seems to me that if you could somehow solicit comments on the picture, you then could do text analysis on the comments to see if someone thought they were porn or not. (Well, I'm being a little silly, but there's a germ of an idea there.)


A corollary of Rule 34 is that an algorithm to classify porn is NP-Hard.

Um, so to speak.


So, who's going to write the ROT13 algorithm for images. Just call it ROT128 and rotate the color value of the bits and use a ROT128 image viewer to view the original image.


That function is typically called "Invert".


Probably the easiest way is with motion and sound. Checking for skin would be hard depending on the type of content as mixmax pointed out


It wouldn't solve the entire problem, but you could look for the watermarks that major porn networks stamp on their images.


by filename ,)


Maybe we can channel Potter Stweart into an algorithm somehow?


That's what I was thinking, counselor.


would any one be interested in purchasing an API subscription for this kind of service ? IMO, a pipeline of AI filters can be efficient to some extent.


Put it in Mashape :) (Disclaimer: I work for Mashape)


Invent an algorithm that can calculate humanity's creative thoughts.


Use CrowdFlower.


machine learning? because you also want to filter cats.


Machine learning is a field of research, not a magical incantation that you say to solve everything.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: