Images that fool computer vision raise security concerns

msandford · on March 23, 2015

This is some excellent research!

It reminds me of the CV dazzle anti-facial-recognition makeup that made the rounds a while ago: http://www.theatlantic.com/features/archive/2014/07/makeup/3...

This definitely reinforces my belief that having humans in the loop is not only desirable but necessary. For the majority of human history minus a few years you could only be accused of a crime by another human being. I'd like to see the trend of automated "enforcement" reversed and codify into law that you MUST be accused by a human being.

If everyone is breaking so many laws that the police and courts can't keep up it doesn't mean that humanity is broken. It means that the law has gotten so far out of sync with humanity that the law is broken. People make the laws, not the other way around.

toomuchtodo · on March 23, 2015

> If everyone is breaking so many laws that the police and courts can't keep up it doesn't mean that humanity is broken. It means that the law has gotten so far out of sync with humanity that the law is broken. People make the laws, not the other way around.

The world would be a much better place if more people realized this.

mikeash · on March 23, 2015

This always frustrates me when discussions of plea bargaining and the right to trial come up, and the argument is given that plea bargaining is a necessity because the courts would be horribly overloaded if every case went to trial.

If the system doesn't have the resources to give every accused criminal a fair trial, then either you're making too many criminals, the system doesn't have enough resources, or both. Bypassing trials is just a way to cover your ears and shout "la la la" to ignore the problem.

edison_carter · on March 23, 2015

You're right; there is tremendous pressure from both district attorneys and judges to cop a plea. Trials piss everyone off because they are "such a waste of time." Criminal defense attorneys have to contend not just with the facts of the case and legal precedents, but also the extent to which demanding a jury trial turns the court against your client -- and your other clients, by association.

The other issue is police overcharging crimes. For example, suppose you, in a fit of pique, cut your roommate's arm with a knife. Not cool; that's aggravated assault. But the police then charges you with attempted second degree murder -- or even first degree (eg, premeditated). Now your defense attorney has to work hard just to get your charges down to a reasonable level; the agg assault you should have been charged with in the first place. This is what a huge majority of plea bargaining is about; not getting away with it, but getting the charge down to something that describes your actual offense.

(Source: longtime criminal defense nerd)

incongruity · on March 23, 2015

While I largely agree with you, it strikes me that you've missed an option --

It might be that there's something inefficient in the trial process that holds no weight on "fairness". I'm not a legal expert by any means, so it may not be the case, but it seems as though it's a possibility.

mikeash · on March 23, 2015

You certainly could be right. The way trials are run does not seem very efficient at all, and maybe a lot of the fat could be cut out without impacting outcomes.

On the other hand, if you can get the number of accused criminals down to a reasonable level, it wouldn't matter too much if there was waste in the smaller number of trials.

No doubt the problem can be attacked from many directions.

chongli · on March 24, 2015

The inefficiency stems from the complexity of the law. We have built up over many years an unfathomably large codex and an equally staggering infrastructure devoted to training legions of people (lawyers) in how to read, interpret and apply it.

How do we address this?

One emerging solution employs advanced AIs to pore over the reams of evidence in order to better inform lawyers of the legal situation. I can imagine extrapolating this particular methodology far enough into the future such that the entire process subsequent to arrest (processing, pre-trial hearing, trial and sentencing) might be automated to the degree that it could be accomplished within minutes instead of months.

What would society look like if we had such an "efficient" judicial system?

That's a potentially terrifying possibility. It may be that we'd end up incarcerating orders of magnitude more people; a dystopian reality. How would we then rein in such a powerful system? I have no idea.

wwweston · on March 24, 2015

> The inefficiency stems from the complexity of the law.

Not just that, though -- it also comes from the inherent inefficiencies in trying to recover exactly what happened from any given situation in which the question of whether a crime was committed.

We could start recording everything that happens, but... that's also a potentially terrifying possibility.

DataCreep · on March 24, 2015

> We could start recording everything that happens

You're in luck! Facebook, Google, and many of the wonderful, selfless people who post on HN are already working on that... or at least defending others' "right" to do so if they aren't doing it themselves.

Various entities are tracking who you know, who you sleep with (http://www.whosdrivingyou.org/blog/ubers-deleted-rides-of-gl...), what your face looks like, what your friends' faces look like (thanks, photo tagging enthusiasts!), who you talk to, what you say to them, when you say it, where you go (thanks to various wonderful sources, including ALPR companies like Vigilant), how long you stay there, what you eat, what you wear, what you watch, what you listen to, where you move your mouse while viewing websites, who your doctor is, what medications you take, what the symptoms of that last rash you had were, what your political views are, what you read, where you work, how many steps you took today, what websites you visit, and about 30,000 other bits of data... just to keep you safe!

The future is so amazing! I don't know what I would do without a customized advertising experience(tm). Such a drastic improvement over life in the past where people were so bored of advertisements that they chose to avoid watching them! What none of us knew at the time was that we really just wanted to see more relevant advertisements more often, while giving up our privacy for the corporations' greater good! I sleep much more soundly after a solid day of being bombarded with advertisements that teach me to be a better consumer!

wmil · on March 23, 2015

This is a big issue in the US and it actually goes back to the Warren court. They issued a long series of rulings making it difficult to prosecute cases, without worrying about the consequences.

By the 70s crime had skyrocketed and it was clear that they had gone too far. But instead of issuing a mea culpa and reexamining past rulings, the various courts started allowing prosecutors to claim broad new powers and take extremely aggressive tactics.

By this point there's no real way to fix it. The legal system is based strongly on precedent. It can't just undo major rulings of the past and replace the with something sane.

trhway · on March 23, 2015

>By this point there's no real way to fix it. The legal system is based strongly on precedent. It can't just undo major rulings of the past and replace the with something sane.

sounds exactly like the point where a major fix must be done. Life changes faster and faster. The precedent system was great in 15th century when things look pretty much the same like in the 14th or 13th century.

voronoff · on March 23, 2015

Do you know of any good sources to read up on this? I'd always thought that the rise of plea dealing was linked to minimum sentencing guidelines.

joe_the_user · on March 23, 2015

By the 70s crime had skyrocketed and it was clear that they had gone too far.

Any claim that court ruler enabled crime is highly debatable. Factors causing increases and decreases in crime rates are difficult to pin-point.

One argument is returning Vietnam veterans lead to the increase in violent crime - post-war eras have a history of being associated with crime-increases from the return of veterans habituated to violence.

dragonwriter · on March 23, 2015

The crime wave you refer to tracks pretty well with the demographic bulge from the baby boom; one of the strongest predictors of crime rates has consistently been the proportion of the population that is males in the 16-24 age range.

> The legal system is based strongly on precedent. It can't just undo major rulings of the past and replace the with something sane.

Sure it can; major court rulings have been overturned by later courts, and, even if courts won't do that, the law which they were interpreting when they made the rulings (including the Constitution) can be changed, rendering the old rulings moot.

bsder · on March 23, 2015

> They issued a long series of rulings making it difficult to prosecute cases, without worrying about the consequences.

Um, so? Then we need to allocate more resources to prosecute cases.

If I am falsely accused, I want my day in court. And I want it to be fair. The current system has problems on both fronts.

msandford · on March 24, 2015

> If I am falsely accused

The problem is that there's no way for anyone else to determine a priori if your accusation is false or true. That's what the whole presumed innocent until proven guilty thing is about.

In reality, if you are accused AT ALL, you want your day in court and you want it to be fair. Even if you had committed a crime, if the police did something they're not supposed to that needs to get sussed out in court and you should go free.

Half the point of a trial is to make sure that nothing unfair is done by the investigators (police, prosecutor, etc). This is to keep their power in check so that they'll follow the rules. Otherwise it could get mighty tempting to fudge something a little bit "because we KNOW this is the guy!" and "we need to do the right thing."

frobozz · on March 23, 2015

Why should every crime go to trial? If the offender says "Fair cop, guv, you've got me bang to rights", why waste everyone's time proving that (s)he did it?

It's basically the same as setting up your conditions to take advantage of short-circuit evaluation. You don't put the time-consuming and resource-hungry part first.

Were there even the slimmest chance of acquittal, few defendants would utter that phrase without there being either a benefit to owning up, or an extra penalty for not doing so. This is what plea bargaining and TICs are for.

If there's a suspicion of false confession, meaning that the suspect may not be the offender, that should normally be sorted out before court, so that the correct offender is tried for the correct offence (e.g. wasting police time if voluntary, or some kind of intimidation/coercion offence if not)?

mikeash · on March 23, 2015

By all means, we should allow defendants to simply confess and plead guilty if they wish.

But we should not reward them for doing so. Never should a person be presented with a choice between a certain lesser punishment, or a fair trail and a potential greater punishment. That's the "bargain" in "plea bargain," and it's completely reprehensible.

You say, "few defendants would utter that phrase without there being either a benefit to owning up, or an extra penalty for not doing so." That's exactly how it is! A plea bargain isn't just, "we both know you did it, so just confess and let's skip all the lawyers and stuff." It's always, "We both know you did it, so confess and we'll let you out early. If you insist on taking this to trial then we will throw the book at you." A lot of innocent people will take the plea when faced with that choice.

Not every crime needs to go to trial, but every accused criminal needs to have the right to a trial without being punished for exercising that right. If you remove the punishment then you'll no longer have plea bargains, just pleas.

analog31 · on March 24, 2015

In my view a vital aspect of the trial is to provide necessary public oversight of the police and courts.

I think that a partial measure towards reforming plea bargains would be to require the police to present their evidence for the court to review before entry of a plea. This creates a public record that somebody could investigate in the future. There could also be a provision that if exculptatory evidence is revealed in the future, the plea can be rescinded.

Mc_Big_G · on March 23, 2015

>>Were there even the slimmest chance of acquittal, few defendants would utter that phrase without there being either a benefit to owning up, or an extra penalty for not doing so

Oh man, that's so wrong. Even where there is a LARGE change of acquittal, many choose a plea bargain because they cannot afford a good attorney or because the prosecutor is threatening them with something crazy like 40 years for downloading a movie. Would you take the risk of 40 years, knowing you're innocent? Do you have enough faith in a jury to put the rest of your life in their hands? I doubt it.

warfangle · on March 23, 2015

And some choose suicide over the plea bargain.

Lawtonfogle · on March 23, 2015

For some charges, even full aquital isn't enough to recover your life.

allemagne · on March 23, 2015

I believe his point isn't that every crime necessarily needs to go to trial, it's that we should have enough resources to be able to put every accused criminal to trial. The fact that it's not the case shows the significant disconnect in our legal system.

That said, I don't believe it's very difficult to make an ethical and legal case against the concept of plea bargaining in the first place. I would argue it's very important to a free society to waste everyone's time proving that an accused person is guilty, and that not doing so violates the sixth amendment and fundamental human rights.

bmm6o · on March 23, 2015

The issue isn't whether every case should go to trial. A plea bargain is an exchange of (forgoing) a trial for a reduced charge or sentence. The undesirable outcome (for pretty much everyone but the prosecutor) is that an innocent person accepts a plea bargain to avoid the worst outcome, and we should do more to prevent this from happening.

> If there's a suspicion of false confession, meaning that the suspect may not be the offender

Are you assuming that this represents a minority of cases? Who has to suspect that the accused is not really guilty, a jury of his peers?

Lawtonfogle · on March 23, 2015

>The issue isn't whether every case should go to trial. A plea bargain is an exchange of (forgoing) a trial for a reduced charge or sentence.

In reality it becomes a punishment for demanding a fair trial.

dragonwriter · on March 23, 2015

> If there's a suspicion of false confession, meaning that the suspect may not be the offender, that should normally be sorted out before court

How? Court is the process for sorting that out and assessing whether there is doubt, and the extent of that doubt.

vinceguidry · on March 23, 2015

The problem is that you can't get very far into the discussion before all the available options involve overhauling the entire system. And there's no way, realistically, to do that.

toomuchtodo · on March 23, 2015

> And there's no way, realistically, to do that.

Yes, yes there is. We start with the most victimless crimes (drug possession for personal consumption) and go from there.

Police unions will fight it. Prison unions will fight it. The court system will fight it. Nevertheless, it must be done.

mellavora · on March 23, 2015

In support of your position, how did we ever let people convince us that victimless "crimes" were crimes? The definition of crime rests on the concept of harm, which requires a victim.

dragonwriter · on March 23, 2015

So-called "victimless crimes" are crimes where the harm perceived is a diffuse harm rather than a focused harm on a particular individual.

There's really no reason why diffuse harms should not be criminal while focused ones should (arguably, diffuse harms are less able to be addressed by individual action through the civil justice system, so outside of focused harms where the victim is unable to pursue action after the crime, like murder, diffuse harms are the most important for government to address directly.)

OTOH, there's often a lot more disagreement over whether what some people perceive as a diffuse harm actually is a harm.

mikeash · on March 23, 2015

I've never seen "victimless crime" defined as diffuse versus focused harm, and I would strongly disagree with describing it as such.

For example, dumping mercury into a major river causes extremely diffuse harm, but I'd never describe it as a "victimless crime."

In the other direction, smoking weed in private harms nobody except the smoker. The harm is so focused it doesn't even touch anyone besides the offender. Yet this is almost the canonical example of "victimless crime."

It's not about diffuse harm, it's about whether anyone was harmed at all besides the accused.

dragonwriter · on March 23, 2015

> In the other direction, smoking weed in private harms nobody except the smoker.

This is precisely the point that is contention. The entire argument for prohibiting marijuana is that this is, in fact, not the case, and that, through a number of indirect channels, people "smoking weed in private" harms others throughout society in a variety of ways.

Obviously, as I said, there is considerable disagreement about whether this is, in fact, the case, (and, also, as to whether, even if it is, prohibition mitigates or exacerbates these harms, and as to whether, in any case, prohibition is an ethically-acceptable response even if the harms exist and prohibition mitigates them.)

But it certainly is not the case that those who support prohibition generally agree that the crimes are "victimless" that those opposed to prohibition describe that way.

mikeash · on March 23, 2015

I still don't see any connection.

Lots of what I would call "victimless crimes" are outlawed because of what the proponents of criminalization see as focused harm. Prostitution, for example, is seen as either harming the prostitute, or harming the patron's family. For drugs, it's often considered that the harm is focused on the user, not diffuse.

And again, lots of crimes with diffuse harm are generally agreed on to not fall into the "victimless" category, like dumping toxic materials.

tormeh · on March 23, 2015

The theory goes that the drugs impact the user and since the user becomes a brain damaged criminal afterwards, society will eventually need to deal with the user's shit.

There is some truth to this for some drugs, but not even close to most.

mikeash · on March 23, 2015

Sure, some people hold to that diffuse harm theory. Others hold to a theory of specific harm. Others hold to a theory of no harm.

My point is just that there is no connection, as far as I can see, between crimes which some people as presenting diffuse harm, and crimes which other people see as being victimless. All combinations are not only present but common.

shit_parade · on March 24, 2015

Funny thing about prostitution is if you tape it and put it online it suddenly becomes legal porn.

tcfunk · on March 23, 2015

I think the "diffuse harm" in the case of smoking weed in private is not talking about the act of smoking and who is or is not affected by the chemicals involved, etc.

It is rather looking at the overall picture of "What are the consequences of allowing weed to be bought/sold/grown. Is there harm in it?" The whole infrastructure, not just the end user.

mikeash · on March 23, 2015

Sure, some people look at it that way, some people look at it other ways. I just see no link between people thinking that a crime causes diffuse harm, and people thinking that a crime is victimless.

hueving · on March 23, 2015

Maybe second-order effects is a better way to describe it than 'diffuse'. Take the following:

>In the other direction, smoking weed in private harms nobody except the smoker. The harm is so focused it doesn't even touch anyone besides the offender. Yet this is almost the canonical example of "victimless crime."

It's true that this doesn't directly harm anyone. However, if the smoker is doing this 'illegally' (without a medical license, not in WA or CO, etc) and didn't grow it, he/she is participating in and supporting an illegal drug market via increased demand. If the smoker weren't participating, it would reduce the demand that drives smugglers and the horrific things in Mexico.

mikeash · on March 23, 2015

As far as I understand this comment, you're saying that smoking weed causes harm because it's illegal, and it's illegal because it causes harm. That's a bit... odd.

pc86 · on March 23, 2015

None of what you just said is an argument against "And there's no way, realistically, to do that." which many people (myself included) believe when you're talking about a systematic overhaul of the entire criminal justice system. Saying "oh well it must be done" doesn't negate the fact that practically speaking, it's impossible.

toomuchtodo · on March 23, 2015

We start with the most victimless crimes (drug possession for personal consumption) and go from there.

Did I not call that out enough? My apologies if so. Stop wasting time prosecuting what doesn't provide value in prosecuting.

mikeash · on March 23, 2015

Significantly reducing the number of criminals is fairly easy without overhauling the system. Decriminalize drug use and possession. That'll cut the number of trials (or pleas) substantially all by itself, and even more so if you assume that knock-on effects will reduce crime overall, as many drug legalization proponents think it would. Other nonviolent offenses could be cut back as well. Stop putting people in prison because one of them paid the other one for sex, for example. Here in Virginia, you can potentially go to jail for a year (and, obviously, have a trial if you want it) for driving at 80MPH in a zone marked for 70MPH!

Increasing resources for holding trials is also not terribly difficult and doesn't require any sort of overhaul. It's just a problem of money, and not a big one. Just for a random example, it looks like the courts account for about 0.3% of my local county budget and about 1% of my local state budget. We could literally increase court resources by a factor of 10 with only a modest increase in taxes to fund it.

happyscrappy · on March 23, 2015

> you're making too many criminals

80% of property crime goes unsolved, so unless you legalize vandalism and theft there are going to be criminals.

jessaustin · on March 23, 2015

Perhaps this is another way out of our problem? We could encourage police to "solve" property crimes rather than drug crimes? In analogy to their awful drug-war asset seizures, perhaps they could get a cut of items recovered/reimbursed? I'm sure they would eventually twist such an incentive structure into something else that's awful and unconscionable, but at least in the meantime they might stop with the SWAT raids?

toomuchtodo · on March 23, 2015

Could we not just outlaw SWAT raids?

jessaustin · on March 23, 2015

It's difficult (not impossible!) to directly oppose politically well-connected factions on their bread-and-butter issues. We might have more luck saying to police departments "here is another way you can have lots of money to spend" rather than "we're taking away federal support for the Drug War, which for some time has been the only way you can hire more cops and procure more equipment". The trick would be to make sure that the existing force is redirected into more benign directions, rather than simply growing to handle new activities while perpetuating the Drug War.

toomuchtodo · on March 23, 2015

Can we not just defund them? Can't "Stave The Beast" be used for good?

chc · on March 23, 2015

Play it out in your head. An elected official declares he wants to try and starve a powerful and relatively popular union. What happens next?

ubernostrum · on March 23, 2015

Unions have no necessary connection to the process.

The elected official's opponent in the following election simply runs a "JOHN SMITH IS SOFT ON CRIME, JOHN SMITH IS BAD FOR US AND OUR FAMILIES" campaign, and people who think public-sector unions are the devil will still vote for the opponent, because now you're pressing their bias (which is in favor of "tough on crime, lock 'em all up and throw away the key").

logfromblammo · on March 23, 2015

Are we talking about Wisconsin governor Scott Walker? Because he survives a recall vote and then goes on to help make Wisconsin a right-to-work state.

But wait. He specifically exempted police unions from his busting legislation, didn't he?

toomuchtodo · on March 23, 2015

And after giving away massive tax breaks, is now pleading that the budget can't be balanced.

http://www.washingtonpost.com/blogs/govbeat/wp/2015/02/18/sc...

> To help close the state’s $283 million budget shortfall this year, Wisconsin Governor Scott Walker (R) plans to skip a $108 million debt payment scheduled for May.

> By missing the May payment, Walker will incur about $1.1 million in additional interest fees between 2015 and 2017. The $108 million debt will continue to live on the books; Walker’s budget proposal for 2015-2017 will pay down no more than about $18 million of the principal.

> In March last year, Walker signed a $541 million tax cut for both families and businesses. At that point, Wisconsin was facing a $1 billion budget surplus through June 2015, the Journal Sentinel reported.

toomuchtodo · on March 23, 2015

I didn't say it was pleasant. What's necessary rarely is.

jessaustin · on March 23, 2015

So you're advocating some sort of violent overthrow? We can't just snap our fingers and remake the political system into something completely different than it is, no matter how much Lawrence Lessig wishes we could. Grim fanatical purity is good for fundraising, but it won't get any laws passed.

toomuchtodo · on March 23, 2015

No, I'm advocating for politicians to make decisions that are better for society, but will cause them to not be re-elected (similar to what happened in Australia when they outlawed firearms).

jessaustin · on March 23, 2015

So you want the whole system to be based on the assumption that human beings are good? That is every bit as realistic as the assumption that USA is identical to Australia. b^)

julianz · on March 23, 2015

I agree with you, but the Port Arthur massacre happened 8 weeks into Howard's term as PM of Australia. The Howard government was in power for 10 years after that, they won four straight elections.

code_duck · on March 24, 2015

Their solution would probably be to conduct SWAT raids and roadblocks to find stolen property, continuing the trend of constitutionally questionable practices.

john_b · on March 23, 2015

"Unsolved" is very different from "unprosecuted". The question here is why the courts are overloaded, not why there isn't additional load due to the 80% of property crimes you mention.

toomuchtodo · on March 23, 2015

Or fix the underlying problems that causes vandalism and theft.

mikeash · on March 23, 2015

The 80% of property crime that goes unsolved is, by definition, not causing an excessive load on the court system.

yuhong · on March 23, 2015

Another thing I don't like is legislators making laws without thinking about enforcement, which is why I don't like anti-discrimination laws.

yutah · on March 23, 2015

I think it is this way for 2 reasons... one, for-profit prisons and two, the ability for the government to put anyone away for a while

joe_the_user · on March 23, 2015

Well, I think a large percentage of people do realize this.

They just forget it as soon as the news media present whatever is the current horror story.

nickysielicki · on March 23, 2015

The fact that can and does occur means the ideas of laws are broken.

But that's just the anarchist in me speaking.

lovemenot · on March 23, 2015

>> It reminds me of the CV dazzle anti-facial-recognition makeup

I had a similar thought. An app which takes the image of a target - Julian Assange say - and overlays it onto your facial image as thousands of virtual stickers. Gradually the evolutionary algorithm morphs this overlay towards your real facial image; its fitness function being fewer, more unobtrusive stickers. All the while maintaining compatibility with target facial recognition.

One day, J.A. escapes from his embassy lair. Thousands of people simultaneously appear on the streets wearing physical stickers on their faces, addressing the London panopticon: "I am SpartAssange".

platz · on March 23, 2015

Someone talking about the kinect said that most all facial recognition algorithms require the face to be level horizontally; if the face is tilted at an angle then it doesn't work. not sure if true but interesting observation.

drpgq · on March 23, 2015

That's really not that interesting. I work as a research scientist at a face recognition company. If you want, you can detect faces at any angle, but sometimes for speed you will only check upright faces.

platz · on March 23, 2015

it might not be interesting to you as a researcher, but to me as a participant in in a world increasingly filled with face recognition products it's worth knowing that different products will have different capabilities based on their "speed" requirements.

clort · on March 23, 2015

I am not a researcher, but I would suggest that perhaps you would find that in the case where a crowd was being parsed in realtime (looking out for somebody so we can catch them before they leave the station for example), the "speed" dial would be turned up as high as necessary, but that the video stream would be stored and the data set reprocessed at a later time with it at a better setting (so you can still say "yes she was here at x time, though we missed her")

in other words, the bandwidth of the image recognition product is not the same as the bandwidth of the video storage.

platz · on March 23, 2015

yes, highlighting what is possible is always needs to be taken into account. But again, what is possible is not the same as what happens in reality. Just because we can't rule something out, does not make it certain either.

> the "speed" dial would be turned up as high as necessary

It's possible, and probable given the situation (i.e. government, terror acts), but are there situations where this wouldn't happen in a different circumstance? Would mall security, or a marketing company operating an advertising product, have the ability, authorization, know-how, or financial incentive to do so if this wasn't a life/death situation?

Not every application of facial recognition technology will be targeted to terror suspects/bombings.

eternalban · on March 23, 2015

It's not just the recognition front end that is problematic. In context of the issue of mis-recognition, Aphyr's tear down of various backends [1] -- which for all we know are in use by the lettered agencies -- gives me the willies.

[1]: https://aphyr.com/tags/Distributed-Systems

JoeAltmaier · on March 25, 2015

Humanity isn't some static thing. Cultures become dysfunctional just like familes. Sometimes they have to be changed or they will collapse. Its disingenuous to suggest 'everybody would be happy if only the cops would quit hassling me'. That's sophomoric.

kevin_thibedeau · on March 23, 2015

It's not excellent research. All they did is create a naive NN without any feature recognition and noted that it has a high rate of false positives when presented with random imagery. Duh.

woah · on March 23, 2015

Really? From the article:

"They tested with two widely used DNN systems that have been trained on massive image databases."

Is this wrong?

kevin_thibedeau · on March 23, 2015

GIGO. Just because they used a "Deep" architecture doesn't mean it was suitable for the task.

jessaustin · on March 23, 2015

Does your criticism extend to the whole enterprise of facial recognition, or did this research somehow fail to include the secret sauce that makes it work?

karpathy · on March 23, 2015

This work has led to some unfortunate misconceptions.

In particular, this weakness has nothing to do with Computer Vision and also nothing to do with deep learning. They only break ConvNets on images because images are fun to look at and ConvNets are state of the art. But at its core, the weakness is related to use of linear functions. In fact, you can break a simple linear classifier (e.g. Softmax Classifier or Logistic Regression) in just the same way. And you could similarly break speech recognition systems, etc. I covered this in CS231n in "Visualizing/Understanding ConvNets" lecture, slides around #50 (http://vision.stanford.edu/teaching/cs231n/slides/lecture8.p...).

The way I like to think about this is that for any input (e.g. an image), imagine there are billion tiny noise patterns you could add to the input. The vast majority in hundreds of billions are harmless and don't change the classifications, but given the weights of the network, backpropagation allows us to efficiently compute (with dynamic programming, basically) exactly the single most damaging noise pattern out of all billions.

All that being said, this is a concern and people are working on fixing it.

jeffclune · on March 23, 2015

Hi Andrej,

I agree with most of what you say, but note that nearly all of the images in the paper were generated without the gradient. I.e. all the images produced by evolution did not use the gradient, only the output of the network regarding its prediction confidence. There are some images that use the gradient, but only to show a 3rd class of "fooling images".

PS. It's nice to see our work (both this paper and the NIPS paper on transfer learning) in your class. Thanks for including it. I wish I could have my students take your course!

yosinski · on March 23, 2015

> This work has led to some unfortunate misconceptions.

Agreed; the weaknesses reported should definitely not be taken to affect only convnets or only deep learning. Ian's "Explaining and Harnessing Adversarial Examples" paper (linked by @Houshalter) should be required reading :).

> backpropagation allows us to efficiently compute (with dynamic programming, basically) exactly the single most damaging noise pattern out of all billions.

True. By using backprop, one can easily compute exact patterns of pixelwise noise to add to an image to produce arbitrary desired output changes. However, it's an important detail that that most of the images in the paper (all except the last section) were produced without knowledge of the weights of the network or by using backpropagation at all. This means a would-be-adversary need not have access to the complete model, only a method of running many examples through the network and checking the outputs.

> ...there are billion tiny noise patterns you could add to the input.

Perhaps because the CPPN fooling images were created in a different way (without using backprop), they seem to fool networks in a more robust way than one might think. Far from being a brittle addition of a very precise, pixelwise noise pattern, many fooling images are robust enough that their classification holds up even under rather severe distortions, such as using a cell phone camera to take a photo of the pdf displayed on a monitor and then running it through an AlexNet trained with a different random seed (photo cred: Dileep George):

http://s.yosinski.com/jetpac_digitalclock.jpg http://s.yosinski.com/jetpac_greensnake.jpg http://s.yosinski.com/jetpac_stethoscope.jpg

I thought this was surprising the first time I saw it.

karpathy · on March 23, 2015

The two classes that you describe: 1. Adversary has the weights and architecture and 2. Adversary can only do forward pass and observe output, are equivalent when all you're trying to do is compute the gradient on the data. In case 1 I use backprop, in case 2 I can compute the gradient numerically, it just takes a bit longer. Your stochastic search speeds this up.

Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models. It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another, likely since they share similar training data (?) unclear. Anyway, really cool work :)

jeffclune · on March 23, 2015

> Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models. It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another,

Agreed. That is surprising, and also increases the security risks, because I can produce images on my in-house network and then take them out into the world to fool other networks without even having access to the outputs of those networks.

> likely since they share similar training data (?) unclear.

The original Szegedy et al. paper shows that these sort of examples generalize even to networks trained on different subsets of the data (and with different architectures).

> Anyway, really cool work :)

Thanks. :-)

yosinski · on March 23, 2015

> Agreed. That is surprising, and also increases the security risks, because I can produce images on my in-house network and then take them out into the world to fool other networks without even having access to the outputs of those networks.

Good point. You could also do this with the gradient version too (fool in-house using gradients -> hopefully fool someone else's network), but the transferability of fooling examples might differ depending on how they are found.

wodenokoto · on March 24, 2015

I've quite enjoyed reading your paper since it was uploaded to arxiv in December and I have been toying with redoing the MNIST part of your experiment on various classifiers. (I'm particularly interested to see if images generated against an SVM can fool a nearest neighbor or something like that.)

But I'm having problems generating images: A top SVM classifier on MNIST has a very stable confidence distribution against noisy images. If I generate 1000 random images, only 1 or 2 of them will have confidences that are different from the median confidence distribution. That is, all the images are classified with the same confidence as class 1. They also share the same confidence for class 2, etc.

So it is very difficult to make changes that affect the output of the classifier.

Any tips on how to get started with generating the images?

jeffclune · on March 24, 2015

I would just unleash evolution. 1 or 2 in the first generation is a toehold, and from there evolution can begin to do its work. You can also try a larger population (e.g. 2000) and let it run for a while.

yosinski · on March 23, 2015

> ...in case 2 I can compute the gradient numerically, it just takes a bit longer.

Yep, true, might just take a while. On the other hand, even a very noisy estimate of the gradient might suffice, which could be faster to obtain. Perhaps someone will do that experiment soon. Maybe you could convince one of those students of yours to do this for extra credit?? ;).

> Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models.

Ditto x2.

> It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another, likely since they share similar training data (?) unclear.

Yeah. I wonder if the subspaces found using non-gradient based exploration end up being either larger or overlapping more between networks than those found (more easily) with the gradient. Would be another interesting followup experiment.

JoeAltmaier · on March 23, 2015

Wait - they didn't use knowledge of the neural network internal state to calculate these patterns? Does that mean they could create equivalent images for human beings? What would those look like!

yosinski · on March 23, 2015

No, but we did make use of (1) a large number of input -> network -> output iterations, along with (2) precisely measured output values to decide which input to try next. It may not be so easy to experiment in the same way on natural organisms (ethically or otherwise).

Of course, if you're as clever as Tinbergen, you might be able to come up with patterns that fool organisms even without (1) or (2):

https://imgur.com/a/ibMUn

JoeAltmaier · on March 23, 2015

Perhaps a single experiment on millions of different people? A web experiment of some kind? "Which image looks more like a panda?" and flash two images on the screen.

yosinski · on March 23, 2015

That's a good idea, though note that there's a difference between asking "Which of these two images looks more like a panda?" and "Which of these two images looks more like a panda than a dog or cat?". The latter is the supervised learning setting used in the paper, and generally could lead to examples that look very different than pandas, as long as they look slightly more like pandas than dogs or cats. The former method is more like unsupervised density learning and could more plausibly produce increasingly panda-esque images over time.

A sort of related idea was explored with this site, where millions (ok, thousands) of users evolve shapes that look like whatever they want, but likely with a strong bias toward shapes recognizable to humans. Over time, many common motifs arise:

http://endlessforms.com/

tripzilch · on March 25, 2015

Problem is, even if you succeed and end up with a fabricated picture that fools human neural nets into believing it's a picture of a panda, how would you tell it's not really a picture of a panda?

You'd need another classifier to tell you "nope it's actually just random noise and shapes" ... hm.

JoeAltmaier · on March 31, 2015

Probably not hard - just engage the higher cognitive functions. "Does it look like noise? Yeah." Or just close one eye and look again.

tripzilch · on April 7, 2015

I think you missed my somewhat deeper philosophical point :)

Who gets to decide what is really a picture of a panda?

If we'd manage to craft a picture that could with very high certainty trick human neural nets (for the sake of argument, including those higher cognitive functions) into believing something is a picture of a panda, "except it actually really isn't", what does that even mean?

Human insists it's a picture of a panda, computer classifier maintains it's noise and shapes.

Who is right? :)

JoeAltmaier · on April 12, 2015

Interesting, sure. But I started out wondering if some obviously-noise picture could be found that fooled humans, at least at first glance. "Hey a panda! Wait, what was I thinking, that's just noise!" It would be weird and cool, on the order of the dress meme etc. but much more so.

Kind of like the memes in Snowcrash, ancient forgotten symbols that make up the kernel of human thought.

irascible · on March 24, 2015

Like a picture of a blue and black dress?

jeffclune · on March 23, 2015

They would look like optical illusions!

joe_the_user · on March 24, 2015

Can they create human equivalents?

That would require that human vision works in the fashion of neural networks/svms/etc. There actually isn't any evidence for this.

JoeAltmaier · on March 24, 2015

What?! Neural networks are explicitly designed to work in the fashion of - you guessed it, neural networks. Like, retina and optic nerve.

joe_the_user · on March 24, 2015

"In modern software implementations of artificial neural networks, the approach inspired by biology has been largely abandoned for a more practical approach based on statistics and signal processing." [1]

[1] http://en.wikipedia.org/wiki/Artificial_neural_network

JoeAltmaier · on March 25, 2015

The embodiment is changed of course. But the process of successive ranks of weighted accumulators has not been. Which is the neural model. Of course its different. But there's still the question of, could failure modes of the mathematical model be present in the biological one? Its a question of modeling, not wetware vs hardware.

joe_the_user · on March 25, 2015

But that model is not the way the overall activity of the brain is currently understood - neurons are seen as being far more complex than simple threshold machines. They may involve thresh effects but the claim of them working overall like any version of artificial neural works is no longer supported by anyone.

jeffclune · on March 23, 2015

Ha. Jason and I were typing at the same time. :-)

oh_sigh · on March 24, 2015

Of course, this requires that the attacker have complete knowledge of all the weights of the given network.

jeffclune · on March 24, 2015

No, it does not. See my reply to similar comments.

Houshalter · on March 23, 2015

A paper came out that explains this effect and a method of minimizing it: http://arxiv.org/abs/1412.6572

Basically neural networks and many other machine learning methods are highly linear and continuous. So changing an input just slightly should change the output just slightly. If you change all of the inputs slightly in just the right directions, you can manipulate the output arbitrarily.

These images are highly optimized for this effect and unlikely to occur by random chance. Adding random noise to images doesn't seem to cause it, because for every pixel changed in the right direction, another is changed in the wrong direction.

The researchers found a quick method of generating these images, and found that training on them improved the net a lot. Not just on the adversarial examples.

mlmonkey · on March 23, 2015

> Basically neural networks and many other machine learning methods are highly linear

No they're not! You introduce non-linearities like the sigmoid or tanh to make them highly non-linear.

neuralk · on March 23, 2015

I was thinking the same thing until I scanned through the paper linked above. While neural networks are indeed non-linear, some NNs can still exhibit what amounts to linearity and suffer from adversarial linear perturbations. Example of linearity in NNs that the authors are considering from the paper:

>The linear view of adversarial examples suggests a fast way of generating them. We hypothesize that neural networks are too linear to resist linear adversarial perturbation. LSTMs (Hochreiter & Schmidhuber, 1997), ReLUs (Jarrett et al., 2009; Glorot et al., 2011), and maxout networks (Goodfellow et al., 2013c) are all intentionally designed to behave in very linear ways, so that they are easier to optimize. More nonlinear models such as sigmoid networks are carefully tuned to spend most of their time in the non-saturating, more linear regime for the same reason. This linear behavior suggests that cheap, analytical perturbations of a linear model should also damage neural networks.

drostie · on March 23, 2015

Right. The basic idea is something like the transition (manifolds) from doing special relativity to general relativity. The special "linear" says that given two inputs x and y to a function f, f is linear if f(x + y) = f(x) ⊕ f(y) for two operations +, ⊕. The general "linear" says that f(x + δx) = f(x) ⊕ δf(x, δx) for some small perturbations δx in the vicinity of x.

If x is a bit-vector then this can be as simple as saying "flip one bit of the input and here's how to predict which output bits get flipped." When you're building a hash function in cryptography, you try to push the algorithm towards a non-answer here: about half the bits should get flipped, and you shouldn't be able to predict which they are. But of course there's a security vulnerability even if + and ⊕ are not XORs.

Resisting "adversarial perturbation" in this context means basically that neural nets need to behave a bit more like hash functions, otherwise they will confuse the heck out of us. The problem is that if you just took the core lesson of hash functions -- create some sort of "round function" `r` so that the result is r(r(r(...r(x, 1)..., n - 2), n - 1), n) -- seems like it'd be really hard to invent learning algorithms to tune.

Houshalter · on March 23, 2015

Most image recognition neural networks use Relu activations which is just a linear output unless the input is below zero. Even sigmoids/tanh are most linear in the middle region, and weight penalties are used to keep the weights small so they stay in that region.

But it doesn't really matter what activation function you use. The paper argues that its the linear layers between the nonlinearities that are the problem.

bendyBus · on March 23, 2015

Everything non-linear is linear to a first-order Taylor expansion. Hence why you can evolve small perturbations, the same thing is used for explicit numerical integration of non-linear equations.

crpatino · on March 23, 2015

Nobody is claiming that this changes are more than astronomically possible to be produced by random processes, the title in such case should have read "... raises robustness concerns" instead of security concerns.

In every security system, it is assumed that if there is an attack surface, sooner or later an intelligent adversary will come and exploit it. And there is a long precedent saying that if you see the word "linear" anywhere in the attack surface description, the adversary is bond to come sooner rather than later.

jeffclune · on March 23, 2015

You say that these images are highly optimized to produce this effect and would not occur by chance, but have you looked at the images in the "fooling" paper?

http://www.evolvingai.org/fooling

Some of them are very simple, and DO occur a lot in the world. For example, the alternating yellow and black line pattern would be encountered by a driverless car, and it would think it is seeing a school bus.

Nvn · on March 23, 2015

>Some of them are very simple, and DO occur a lot in the world. For example, the alternating yellow and black line pattern would be encountered by a driverless car, and it would think it is seeing a school bus.

While the image shows a yellow and black line pattern to us, are you sure this is also what the CNN "sees"? Couldn't this image just be the same as the adversarial images, i.e. it responds to many small input values rather than the overall pattern?

If it's possible to make the CNN predict an ostrich for an image of a car, then the same can be done of an image of an alternating yellow and black line pattern, no?

z5h · on March 23, 2015

This raises an interesting question. Do we want computers to see "correctly", or to see how we see?

Would a preferred computer vision system experience the Checker shadow illusion? http://en.wikipedia.org/wiki/Checker_shadow_illusion

If yes, computer vision will be as fallible as ours. If no, then there will always be examples, like the ones presented, where computers will see something different than humans.

Kronopath · on March 23, 2015

The question is, what do you mean by "correctly"?

When we look at the Checker Shadow Illusion, our brains are automatically "parsing" that image into a 3D scene and compensating for the lighting and shadowing. The reason you see square B as being lighter than A is because, if you could reach into the image and remove the cylinder so it's not casting a shadow anymore, square B would be lighter than A. Our brain doesn't think about color in terms of absolute hex values. Instead it tries to compensate for the lighting and positioning of elements, assuming that they're similar to what we would see in real life—a challenge which, by the way, computer vision systems have always struggled with.

That sounds pretty "correct" to me.

maxerickson · on March 23, 2015

A computer vision system can have multiple ways of processing an image. So at the limit, it could interpret a scene in terms of what a human sees and also have a separate, better understanding of the scene.

JoeAltmaier · on March 23, 2015

The OP shows that computers DO NOT have a better understanding. Its evident they have no understanding at all; they are simply doing math on pixels and latching on to coincidental patters of color or shading.

People recognize things by building a 3D model in their head, then comparing that to billions of experiential models, finding a match and then using cognition to test that match. "Is that a bird? No, its just a pattern of dog dropping smeared on a bench. Ha ha!"

maxerickson · on March 23, 2015

How could I have better put So at the limit, it could?

I meant to talk about what some hypothetical future system could do (which I think was a reasonable context given the comment I replied to), not to characterize current systems.

JoeAltmaier · on March 23, 2015

Sorry, I re-read and see that.

To get there, computers will clearly have to change utterly their approach. A cascaded approach of quick-math followed by a more 'cognitive' approach on possible matches, could definitely improve on the current state of affairs.

eli_gottlieb · on March 23, 2015

>People recognize things by building a 3D model in their head, then comparing that to billions of experiential models, finding a match and then using cognition to test that match. "Is that a bird? No, its just a pattern of dog dropping smeared on a bench. Ha ha!"

So you're saying people are generative reasoners with very general hypothesis classes rather than discriminative learners with tiny hypothesis classes.

To which the obvious response is, yes, we know that. The question is how to make some computerization of general, generative learning work fast and well.

JoeAltmaier · on March 23, 2015

People are far more than that. Lots of our brain is dedicated to visual modeling. Those 'hypothesis classes' are just the tip of the iceberg. For computers, they're the whole enchilada. To mix metaphors.

sp332 · on March 23, 2015

I don't think your description of how humans recognize things is true. You can do object recognition of a silhouette, or a 2D image, or an impressionistic rendering. http://www.sciencedaily.com/releases/2009/04/090429132231.ht...

JoeAltmaier · on March 23, 2015

We can't help but build real models of what we see - our retina/optic nerve are already doing this before our brain even receives the 'image'!

I can't help but believe some of the image recognition mentioned in your article, especially of icons, is built through previous experience with similar iconic images. Symbols for things become associated with the real things. Its a modern adaptation of a much older processing mechanism.

sp332 · on March 23, 2015

OK... but how is that pattern-matching different from what the computer is doing? Why is human pattern-matching "understanding" and computer patter-matching is not?

JoeAltmaier · on March 23, 2015

Its the 2nd state of cognitive engagement that makes humans different. Of course a field of static isn't a panda. The computer has no capacity to recognize the context.

sp332 · on March 23, 2015

I think I get your point now. It's OK if a human momentarily mistakes a random blob for a panda, but they should be able to figure out from other visual cues and context that it's not a panda. And it's that second part that's missing from the computer models?

JoeAltmaier · on March 23, 2015

That's it. Both consciously and subconsciously - lots of image filtering going on unaware.

pavel_lishin · on March 23, 2015

That would be interesting; it could flag inputs that are ambiguous to humans but not machines (or vice versa, or when there's a discrepancy at all) since it could suggest that something shady is happening.

Balgair · on March 23, 2015

You can always fool any system into a paradox. Very grossly speaking, this comes out of Godel's incompleteness theorem; that with any set of laws you can always get P=~P out of the set. How this paradox looks, acts, or feels is interesting and possibly artistic, as the OP shows. If anything, I think there is a bit of beauty, art, and cleverness in that.

lars · on March 23, 2015

That isn't quite right. With a sufficiently powerful formal system, you're forced to either have inconsistency or incompleteness - you're describing a system that is inconsistent. It's usually much better to have consistency and to sacrifice completeness. Then you'll have Ps that are true but unprovable, but at least you won't have P=~P which makes the system rather useless.

im2w1l · on March 23, 2015

What does true but unprovable mean? What happens if you take such a proposition, negate it and add as an axiom?

lars · on March 23, 2015

I'm not a logician by a long shot, so I probably can't explain that correctly. I think Gödel found a way to make a logical proposition refer to itself, and then found a way to assert provability. He could then construct the sentence "this sentence is not provable". He showed that such a sentence must exists within any system of sufficient power. Thus the system must be self-contradictory (inconsistent), or the sentence must be true (and the system must be incomplete). I'm not sure if such a sentence still refers to itself when negated, so I can't answer the last one.

dunstad · on March 23, 2015

My understanding of the incompleteness theorem is that, for a given set of axioms, there will be unprovably true things. Changing the axioms would change which things were unprovable.

That being said, here is a much better resource than I am: http://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_t...

snowwrestler · on March 23, 2015

Are all humans susceptible to the checker shadow illusion? Or just those that have been trained to interpret flat arrangements of color as accurate representations of 3D scenes and objects? If you showed the checker illusion to someone who had never seen a photograph or representative painting, would they see different colors or the same color?

I don't know the answer. I do find it fascinating that something as simple as perspective (which we take for granted in a graphic like the checkers illusion) is a fairly recent technique, invented by Renaissance artists.

Gifford · on March 24, 2015

With good lighting control, you could set up he checkerboard illusion in the real world. That's part of he point.

higherpurpose · on March 23, 2015

Does it matter if we want them to do it or not? NSA and China will build them to do that anyway.

compbio · on March 23, 2015

It is good to know that they need access to a lot of predictions from a net, before they can create an image that will "fool" the net, but look alien to humans. Secondly, this doesn't account for ensembling: "fool me once, shame on you. Fool me twice...". Since the images are crafted for a single net, a majority vote should not be fooled by these images. I suspect this effect rapidly goes away when adding more nets (which is basically industry-standard practice to increase accuracy).

Furthermore, I am seeing the security concerns, but I figure this is far from a practical attack. Deep Learning Classifiers do not act as gatekeepers: You have not much to gain from a single faulty classification. You won't be granted access to secret information if you happen to look like the CEO.

userbinator · on March 23, 2015

Furthermore, I am seeing the security concerns, but I figure this is far from a practical attack

Perhaps it's a sign of the times that almost every discovery that could possibly be related to security in some way, does. I have a feeling that if this was a decade or two ago, the sentiment would be very different. ("Can you figure out what a computer thinks these images are?")

Also, the image labeled "baseball" immediately reminded me of a baseball...

jerf · on March 23, 2015

"Secondly, this doesn't account for ensembling: "fool me once, shame on you. Fool me twice...". Since the images are crafted for a single net, a majority vote should not be fooled by these images."

Trivially "solved" by treating the ensemble as a single object, then constructing a counterexample. My intuition suggests that while the resulting "fooled you" image may very slowly converge on something human recoginizable, it won't do so at a computationally-useful rate.

compbio · on March 23, 2015

Now I have to try this out. My intuition tells me it becomes increasingly hard to create a fooling image, which looks alien, and is able to fool all the nets in the ensemble, even though they have different settings and params. I think they can only fool one net at a time, and have to get very lucky to be able to evolve the image for the other nets, while keeping the same classification. You can't "train" these images on all nets at once, by simply treating the ensemble output as a single net.

If your intuition is right though, then the ensemble may be able to counter with a random selection of nets for its vote: You'd need to evolve images for every possible combination and/or account for nets added in the future.

Houshalter · on March 23, 2015

Overfitting has nothing to do with it. See this paper: http://arxiv.org/abs/1412.6572

I believe the original paper tried ensembles and even got the images to work on different networks.

compbio · on March 23, 2015

I was not talking about overfitting. I've seen that paper.

The original paper asked if images that could fool DBN.a could fool DBN.b. The answer was: certainly not all the time. They used the exact same train set and architecture for DBN.a and DBN.b, just randomly varied initial weights. I think this is too favorable for a comparison with a voting ensemble made with nets with a different architecture, train set and tuning. Can they also find images that can fool DBN.a-z?

Also, to test if a net can learn to recognize these fooling images, they simply add them to the train sets. Those noisy images would be far simpler to detect: They have a much greater complexity than natural images. To detect the artsy images, a quick knearest-neighbors run should show that they do not look much like anything it has seen before, so it may be an adversarial image.

Houshalter · on March 23, 2015

To be clear I meant this paper (http://arxiv.org/abs/1312.6199) as the original paper for adversarial images. I think they did try transferring them between very different NNs:

>In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

compbio · on March 23, 2015

Very interesting. Thank you for correcting me.

a relatively large fraction of examples will be misclassified by networks trained from scratch with different hyper-parameters (number of layers, regularization or initial weights). The above observations suggest that adversarial examples are somewhat universal...

jeffclune · on March 23, 2015

Surprisingly, ensembles do not really help. We have tried this and it does not work (the final paper for CVPR 2015 will show these results). Also see the work of Szegedy et al. and Goodfellow et all, which also show that ensembles do not really help.

jeffclune · on March 23, 2015

Also, you don't necessarily need a lot of predictions from a net to fool it, because (another surprising result) images that fool one net tend to fool others! So I can create fooling images on my in-house net and then take them and fool your net-used-for-some-important-application without getting any feedback from your net. That's very surprising, and does raise serious security concerns.

pavel_lishin · on March 23, 2015

> Since the images are crafted for a single net, a majority vote should not be fooled by these images. I suspect this effect rapidly goes away when adding more nets

Sure, but this assumes that whatever neural net system you're relying on was bought by someone who is more security conscious than they are cheap.

scribu · on March 23, 2015

> You won't be granted access to secret information if you happen to look like the CEO.

Tell that to this startup: http://onevisage.com/ovi-technology-page/

masterfres · on March 23, 2015

Very interesting. First of all here's a youtube video associated with the paper -- https://www.youtube.com/watch?v=M2IebCN9Ht4 . Second some on here have posted about the Svegedy, Goodfellow, and Shlens paper http://arxiv.org/abs/1412.6572 which discusses the opposite effect. The Svegedy research is mentioned in the Nguyen paper and specifies that given an image that is correctly classified in a DNN, you can alter that image in a way imperceptible to a human to create a new image that will be INCORRECTLY classified. The Nguyen, Yosinski, et al. work that's the subject of this post states that given a DNN that correctly classifies a particular image, you can construct a gibberish image that the DNN will classify as the same image.

Both results are interesting from the point of DNN construction, and there have been some papers suggesting ways to counter the effects specified in the Svegedy research. In practice (as others have mentioned) in order to construct an exploit similar to the one described in this post, you'd need to have a lot of knowledge about the DNN (e.g. weights) that an external attacker wouldn't have.

What this does leave open, though is a disturbing way for someone with internal access to a DNN doing important work (e.g. object recognition in a self-driving car) to cause significant damage.

uncoder0 · on March 23, 2015

Not directly related but, I was at a security related convention and overheard some people talking about an image that when occupying <3/4 of a frame will crash any digital camera. (Phone, DSLR, IP Camera) Does anyone know anymore information about this image and effect? I imagine it's a bug in some low level firmware of a common IC for digital photography DSP but, I'm very unfamiliar with digital cameras. It also could have been complete bunk because I've not heard of it since and it wasn't being showcased at the convention.

geographomics · on March 23, 2015

Not what you're looking for, but your comment reminded me of this: https://en.wikipedia.org/wiki/BLIT_%28short_story%29

Houshalter · on March 23, 2015

Could be related to this? http://en.wikipedia.org/wiki/EURion_constellation

TazeTSchnitzel · on March 23, 2015

It's most likely not the constellation itself, but the other mechanism:

https://en.wikipedia.org/wiki/EURion_constellation#Other_ban...

In particular, have a look at this:

http://www.cl.cam.ac.uk/~sjm217/projects/currency/

userbinator · on March 24, 2015

If there is such a thing I'm more inclined to believe that it's an analogue effect, rather than a bug in software/firmware. it's similar to the effect that causes some monitors to emit audible sound when displaying certain images: https://news.ycombinator.com/item?id=8862689

Digital camera sensors output a stream of bits that depend on the intensity of the light reaching the pixels. For normal images, there is (relatively speaking) not so much contrast, so the signal has few high-frequency or repetitive components to it. However, if you point the sensor at an image that effectively causes each pixel to be alternating from full dark to full bright, the signal becomes far more regular and the high-frequency components increase significantly. A possible problem is that, since the bulk of the power draw occurs when a bit transitions from 0-1/1-0, this repetitive and high-frequency signal causes more stress on the power supply circuitry (look up "voltage regulator oscillation"), and if it causes voltages to go out of tolerance, can crash the system. I suppose an image that produced a stream of 010101010... for each pixel's value could also be an example of this. Other resonant effects may also play a role in this; if the oscillation frequency happens to synchronise with something else, physical damage is a possibility if the components are pushed beyond absolute maximum ratings. It's an extreme edge case, not normally encountered in use.

The reason why I think this could be plausible is that, although I've never tried/experienced this with a digital camera, I had an old analogue video camera that would work fine in all circumstances except when pointed at a monitor displaying its image, upon which it would emit a loud high-pitched whine and then shut itself off. I discovered that one of the power supply rails would go into oscillation when the camera saw itslf, and this was enough to shutdown the system. Adding some extra supply decoupling was enough to stop this from happening, but apparently this problem has been known for a while:

http://en.wikipedia.org/wiki/Video_feedback

jkot · on March 23, 2015

Flying RPG?

Now seriously, modern cameras have face recognition and ton of other features. It should be possible to cause crash.

adrusi · on March 23, 2015

The pattern-based illusions are actually quite interesting, almost artistic. Half of them are recognizable to humans, the other half at least make sense when identified. I wonder if we can automate the production of postmodern art :)

jeffclune · on March 23, 2015

In fact, we submitted them to an art competition and they were accepted and put up in a museum!

http://www.evolvingai.org/share/20150213_184537.jpg

avereveard · on March 23, 2015

Deep Neo Neo Cubism?

pavel_lishin · on March 23, 2015

Peter Watts mentioned this potential problem in his Rifters series; one explicit example was a neural net that ran a train and was trained via a series of inputs, one of which was a clock in a train station; one day, the clock broke, and the neural net took some action that ended up killing all the passengers. (I forget the details.)

Which is not to say that we should all fear computers more than humans as a consequence; we do inexplicable things, too.

woodchuck64 · on March 23, 2015

"“We realized that the neural nets did not encode knowledge necessary to produce an image of a fire truck, only the knowledge necessary to tell fire trucks apart from other classes,” [Yosinski] explained."

This seems markedly different from biological neural networks. Is the difference one of network structure/algorithm or rather the fact that biological neural networks (in human image processing) actually have time and space to learn a lot about each individual image class?

compbio · on March 23, 2015

Deep nets are only loosely inspired by neurobiology. That's why LeCun calls them "convolutional nets" and not "convolutional neural nets" and prefers "nodes" over "neurons".

It is, however, possible to have a deep net produce 3D models/images: https://www.youtube.com/watch?v=QCSW4isBDL0 "Learning to Generate Chairs with Convolutional Neural Networks".

I also suspect a different part of cognition is used when humans are asked to recreate a "fire truck" than when humans are asked to classify a "fire truck" from a "car". The former seems closer to using memory ("what did the last five fire trucks I saw look like?"). A fairly recent addition to deep nets is making use of memory: http://arxiv.org/pdf/1410.5401.pdf "Neural Turing Machines". So the difference may quickly become less significant.

randcraw · on March 23, 2015

This is not so different from recognizing images in their fourier frequency domain. The frequency features and their origins in the spatial domain can be made very unintuitive.

But I'm not clear how important this phenomenon really is to the practice of CV, since 1) 'spoofed' images are highly specific to each DNN being used, and 2) a trivial reality check of the image can always 'out' examples like these.

acadien · on March 23, 2015

Your 2nd point is critical, you can filter these images easily before even running them through the DNN. However researchers are also interested in why it is possible to spoof NN's in general. The typical response of 'overfitting' is being questioned.

Also the question is raised as to whether or not new methods of spoofing are possible that aren't so easily detectable.

jahnu · on March 23, 2015

> But computers don’t process images the way humans do, Yosinski said.

This means that come the singularity AIs will have to use AI specific CAPTCHAs in order to distinguish between humans (aided by dumb computers) and other AIs.

pavel_lishin · on March 23, 2015

Which of the following would you most prefer? A: a puppy, B: a pretty flower from your sweetie, or C: a large properly formatted data file?

msandford · on March 23, 2015

Welp, it's official. I'm a computer.

anon4 · on March 23, 2015

No, you're the maintenance guy. Computers don't care about formatting. The correct answer is A, because "A" is drawn with just three straight lines and computers like straight lines.

pavel_lishin · on March 23, 2015

But it arguably takes more data to encode an A - three lines, which means six endpoints, plus whatever signifies the command "draw a straight line." At minimum, that's seven pieces of data.

A C, however, can be drawn as half of a circle - one command to draw an arc, a center point, a radius, and the start and stop angles. Five pieces of data.

(This is assuming, of course, that computers prefer minimal amounts of data. If that's wrong, then the computer would obviously prefer B. You need more data to describe it.)

Gifford · on March 24, 2015

Two of the A endpoints are the same,so don't need to be encoded twice.

compbio · on March 23, 2015

If we have a singularity AI then we don't need CAPTCHAs: For all intents and purposes an AI is equal or superior to humans, there is no distinction to be made. The Turing Test is a form of CAPTCHA.

Also, one could make CAPTCHA's incredibly hard. Humans and dumb computers won't be able to solve it, and singular AI's get a pass. So give access to anyone who isn't able to solve the CAPTCHA and redirect the singularity AIs to google.com :).

Finally, the singular AI could create a CAPTCHA which separates humans from AI. The problem of separating humans from AI will quickly become too difficult for humans to solve.

jgrowl · on March 23, 2015

It makes me chuckle thinking about machine intelligence being able to solve a captcha when humans can't.

I get a mental image of a future where computers try to keep us pesky humans out of their websites.

JoeAltmaier · on March 23, 2015

Cool! And it would be so easy. Just show one of those noisy thumbnails, and see if the human can identify it as a 'panda'

saalweachter · on March 24, 2015

That's actually a fun-sounding research project: train humans to classify those images. I'd be very interested to know if humans could learn to classify either sets of images. They might not look like a 'panda' to us, but there is some underlying pattern that a machine can pick out and apply the arbitrary label 'panda' to. Can a human learn that same pattern?

JoeAltmaier · on March 24, 2015

Maybe so. But I was thinking, maybe there're weird patterns that human pattern-matching will erroneously classify as well. Probably unconsciously - if we look again, we'd say 'huh, that's just a mess of pixels'. So you'd have to flash them briefly and ask for an instant answer or something, to catch the pattern matcher before the cognitive check got done?

a-dub · on March 23, 2015

Cool! Reminds me of "Shazam Decoys" where barely audible or inaudible energy can be added to a signal to fool Shazam into identifying it as the wrong track.

I've often thought there would be an awesome opportunity in there to make a hilarious app that catches cheaters during the music round of Pub Quiz.

EdwardDiego · on March 24, 2015

> I've often thought there would be an awesome opportunity in there to make a hilarious app that catches cheaters during the music round of Pub Quiz.

The best pub quizzes dim the lights so that smartphone cheats beam forth their sneakiness.

commentereleven · on March 23, 2015

This reminds me of this tool-assisted speedrun: https://www.youtube.com/watch?v=GOfcvPf-22k

The image recognition in Brain Age is obviously much simpler, but it's still basically the same idea.

valine · on March 23, 2015

Here's a link to the Yosinski's paper.

http://yosinski.com/media/papers/Nguyen__2014__arXiv__Deep_N...

ChuckMcM · on March 23, 2015

One of the interesting things was the 'white noise' which was identified as various animals. I reminded me of people looking at noise and "seeing" data. Which for me suggests that at some level this isn't completely an artifact. If the algorithms developed are so closely modeled on human perception are susceptible to this sort of thing, humans probably are too. Perhaps that explains reports of people seeing things in the electronic 'snow' pattern of a disconnected TV?

ars · on March 23, 2015

The difference is that humans know they are seeing noise, they don't claim 99% confidence in their image, but rather a very low confidence.

Udik · on March 23, 2015

It's probably very naive, but that makes me wonder whether these neural nets are trained to recognize noise or meaningless images as such. If we train a system to tell us what an image represents, the system will do its best to classify it in one of the existing categories. But having a low confidence in what an image represents it's not the same as having high confidence in the fact that it doesn't represent anything. So maybe we should train the networks to give negative answers, like "I'm totally confident that this image is just noise".

humanfromearth · on March 23, 2015

They can only fool the DNN because they know their weights. In theory, you could do the same for people's brains.

As long as you don't publish the specs of your net you should be fine I guess.

dsr_ · on March 23, 2015

If you have an oracle, you don't need to know the net specs.

An oracle, here, would be any version of the system that you can query against repeatedly without suffering too much of a penalty.

algorias · on March 23, 2015

That's a good point! But your proposed solution is security by obscurity. What if a rogue employee trained a backdoor into your DNN? What if the NSA asked vendors for a master key?

The purpose of this research is to work as a proof of concept. Sure, this iteration needs to cheat slightly to achieve its results. However, looking at the images that aren't just noise, it seems to be possible to construct less specialized images that fool many nets.

jeffclune · on March 23, 2015

The paper does not use the weights of the network to produce (nearly all of) these images (see the responses to Karpathy above).

ars · on March 23, 2015

> In theory, you could do the same for people's brains.

No because the instant you used an image that was close, but wrong the human brain would retrain.

I wonder in the neural net can do the same - try this experiment on an active neural net and let it train itself. (i.e. don't tell it it's being faked, let it figure it out then correct for it).

Houshalter · on March 23, 2015

No, the images generated worked on even different neural networks trained on different data.

aclinnovator · on March 23, 2015

I think that the major problem with CV is that it only recognizes images in isolation from each-other. Humans understand what they are looking at by finding the concept that lies at the intersection of all the small ideas in the image. For example, a human would recognize a keyboard because it contains a "means of input" on which there are "symbols" specifically the "alphabet", arranged in a "logical format"("QWERTYUIOP") which they know is the sign of a keyboard. If a human were to see a keybaord that looks different from most, they can still make the inference that it is a keyboard by understanding the underlying concepts of what they see.

On the other hand, a computer mechanically relates the specific format of a keyboard to the word "keyboard." It fuzzy matches the pixels of images to extract the object in the image: not the individual ideas implicit in the image.

Computer Vision needs more depth to actually be considered vision.

zk00006 · on March 23, 2015

If you ever work with classifier training, these results are not surprising. You can take all false positives generate by classifier, average them and you will come up with an image that resembles the object to be recognized.

deeviant · on March 23, 2015

I know many here know this already but I'll say it anyway since I ran into people with a misconception related to this: the particular misidentifications are specific to the algorithm and training set used, it's not like all computers recognize those blocks as cheetah or what not.

We use ML-based computer vision at my work, so I have a bit of experience here. I think the biggest practice take away from observation the ML can give some wonky results is that ML system can be a real PITA to debug.

zk00006 · on March 23, 2015

The whole idea is to tear down algorithm, simulate unnatural image that triggers the required responses and pass it to classifier. There is no direct link to security risks here.

c-slice · on March 23, 2015

This isn't very groundbreaking, they're simply discovering the inherent weaknesses in neural net learning. We've know for years that neural nets have inherent gaps in their training and this is just taking advantage of that. This isn't really an issue in a real life scenario as you wouldn't be able to determine without thousands of recursions which images produce faulty results.

jpeterson · on March 23, 2015

Well, obviously they should've checked with you before publishing their paper.

bwross · on March 23, 2015

If you look really closely at the noisy images, you can see little blotches that vaguely resemble the things the computer recognized them as.

Coincoin · on March 23, 2015

Yeah, they say the algorithm erroneously sees an Amarillo. But wait, I actually see an Amarillo and I actually see a centipede.

lawlessone · on March 23, 2015

Machine pareidolia?

On a serious not couldn't we keep training the same DNN's using these white noise images as negative examples?

Houshalter · on March 23, 2015

This paper tried that with positive results: http://arxiv.org/abs/1412.6572

raynassar · on March 23, 2015

That point was covered in the article.

>In a further step, the researchers tried ?retraining? the DNN by showing it fooling images and labeling them as such. This produced some improvement, but the researchers said that even these new, retrained networks often could be fooled.

_bz2r · on March 23, 2015

This makes me wonder if we might be flirting with the computational equivalent of autism.

Could autistic children have learning impairment due to their inability to correctly sort/segregate stimuli, in the same way that these neural networks generate high-confidence false-positives?

justintbassett · on March 23, 2015

I'm not sure I really understand the implications of this. It doesn't seem like this is an inherent weakness in computer recognition of images, but instead a weakness of a particular DNN? Or am I way off base?