Adversarial Learning for Good: On Deep Learning Blindspots

daenz · on Dec 29, 2017

The video example of a turtle being misclassified as a rifle is pretty scary https://www.youtube.com/watch?v=piYnd_wYlT8

We're approaching a world where AI will be more and more relied upon in dangerous situations. Imagine someone getting killed for something ridiculous like inadvertently holding an adversarial example. Public trust would have a hard time recovering.

oh_sigh · on Dec 29, 2017

Would adding random noise to an image before running it through the classifier mitigate these kinds of attacks?

yorwba · on Dec 30, 2017

No, ironically because classifiers are trained to be robust to noise. Adversarial examples are generated by altering the original in one specific dimension of a high-dimensional vector space. Random noise is vanishingly unlikely to undo that transformation, so the classification will stay wrong.

doener · on Dec 29, 2017

>Imagine someone getting killed for something ridiculous like inadvertently holding an adversarial example.

This would never be the case with humans, right? No.

daenz · on Dec 30, 2017

With humans there is accountability (in theory). Who is accountable for the model?

trhway · on Dec 30, 2017

model is adjustable/correctable/trainable/improvable. Humans are much less so.

torthrw · on Dec 29, 2017

Re: adversarial ML & steganography. There were 2 papers at this year's NIPS studying these ideas (http://papers.nips.cc/paper/6802-hiding-images-in-plain-sigh... and http://papers.nips.cc/paper/6791-generating-steganographic-i...)

pyvpx · on Dec 29, 2017

Video can be found here: https://media.ccc.de/v/34c3-8860-deep_learning_blindspots

beefield · on Dec 29, 2017

I wonder how sensitive these results are. I mean, if you run one more learning round of the network, does the rifle turn to turtle in the eyes of the network immediately so that you would need to generate a new turtle for every single network? Or does that turtle look like rifle for all current neural networks? Or, most likely, somewhere in between?

marcosdumay · on Dec 29, 2017

If you don't change the data, why would the network change its results with just some more learning iterations? It's already converging.

beefield · on Dec 30, 2017

Because if I understtod anything correctly, the methods tries to find the smallest possible changes that causes the network to make the incorrect classofication. And these smallest possible changes just might be very sensitive to whatever small random ripple weight changes a network has. But I am far from neural network expert, so I can't really answer.

leastangle · on Dec 29, 2017

I think it would be beneficial to explicitly mention model extraction attacks since they (kind of) enable such attacks: https://news.ycombinator.com/item?id=12557782