Hacker News new | past | comments | ask | show | jobs | submit login
Adversarial Learning for Good: On Deep Learning Blindspots (kjamistan.com)
78 points by doener on Dec 29, 2017 | hide | past | favorite | 12 comments



The video example of a turtle being misclassified as a rifle is pretty scary https://www.youtube.com/watch?v=piYnd_wYlT8

We're approaching a world where AI will be more and more relied upon in dangerous situations. Imagine someone getting killed for something ridiculous like inadvertently holding an adversarial example. Public trust would have a hard time recovering.


Would adding random noise to an image before running it through the classifier mitigate these kinds of attacks?


No, ironically because classifiers are trained to be robust to noise. Adversarial examples are generated by altering the original in one specific dimension of a high-dimensional vector space. Random noise is vanishingly unlikely to undo that transformation, so the classification will stay wrong.


>Imagine someone getting killed for something ridiculous like inadvertently holding an adversarial example.

This would never be the case with humans, right? No.


With humans there is accountability (in theory). Who is accountable for the model?


model is adjustable/correctable/trainable/improvable. Humans are much less so.


Re: adversarial ML & steganography. There were 2 papers at this year's NIPS studying these ideas (http://papers.nips.cc/paper/6802-hiding-images-in-plain-sigh... and http://papers.nips.cc/paper/6791-generating-steganographic-i...)



I wonder how sensitive these results are. I mean, if you run one more learning round of the network, does the rifle turn to turtle in the eyes of the network immediately so that you would need to generate a new turtle for every single network? Or does that turtle look like rifle for all current neural networks? Or, most likely, somewhere in between?


If you don't change the data, why would the network change its results with just some more learning iterations? It's already converging.


Because if I understtod anything correctly, the methods tries to find the smallest possible changes that causes the network to make the incorrect classofication. And these smallest possible changes just might be very sensitive to whatever small random ripple weight changes a network has. But I am far from neural network expert, so I can't really answer.


I think it would be beneficial to explicitly mention model extraction attacks since they (kind of) enable such attacks: https://news.ycombinator.com/item?id=12557782




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: