Audio Adversarial Examples

aeling · on Jan 30, 2018

Prior discussion / link to arxiv: https://news.ycombinator.com/item?id=16220376

bcheung · on Jan 30, 2018

There is a similar attack against image classifiers.

So is this just a proof of concept or can this be exploited in the wild?

Based on my understanding you need to have access to network itself (weights, baisses, activation function, architectural topography) to pull off this kind of attack. Doesn't seem like this could be easily be duplicated as an outside agent.

chestervonwinch · on Jan 31, 2018

To generate adversarials, you really just the Jacobian of the network, which, without the network architecture, etc., would require estimation via finite differences which would require running the same signal as input multiple (lots! for large signals) times with minor perturbations to each component. I think whether or not this is feasible depends on the circumstances. Also, I could be wrong about everything because I'm not super current on this stuff.

whataretensors · on Jan 31, 2018

You don't need access to the network, you can just build another one.

https://arxiv.org/abs/1602.02697

"Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN"

chestervonwinch · on Jan 31, 2018

that's clever and sneaky!

evjrob · on Jan 30, 2018

I wouldn't be surprised if these speech to text models are susceptible to black box attacks in the same way that image classifiers are: https://arxiv.org/abs/1605.07277

jchw · on Jan 30, 2018

I wonder if it's possible for an attack like this to work against the human brain.

grondilu · on Jan 31, 2018

That's the purpose of magic[1], isn't it? Maybe one day magic tricks will be designed by AI. It'd be funny, since that would gave Arthur C. Clark famous quote ("any sufficiently advanced technology...") a whole new meaning.

1. https://en.wikipedia.org/wiki/Magic_(illusion)

bcheung · on Jan 30, 2018

Optical illusions have a similar effect. Also images that people can see in 2 different ways like the famous old/young woman example.

colechristensen · on Jan 30, 2018

Flashy cartoons give some people seizures.

I'd bet they could be engineered to affect a higher proportion of the population.

taneq · on Jan 31, 2018

I'm pretty sure some alarms use a strobe light at a particular frequency combined with a loud 2kHz tone to discombobulate intruders. That's more sensory overload than an adversarial signal, though.

I'd expect a malicious adversarial example to be more like the ones discussed in Snow Crash or BLIT ( https://en.wikipedia.org/wiki/BLIT_(short_story) )

taneq · on Jan 31, 2018

You could argue that cartoons or line drawings are "adversarial examples" that are reliably identified by the human visual system as representing a particular thing despite being quantitatively nothing like what they're meant to represent.

IIAOPSW · on Jan 31, 2018

Remember the black/white gold/blue dress thing from a while back?

stepik777 · on Jan 30, 2018

What do you mean? How would it look like?

jchw · on Jan 30, 2018

Adding noise to a signal to cause it to be perceived unintuitively. It's possible that, as others pointed out, optical illusions are an example of this. But in actuality, if such a thing existed, I'm not sure we'd be able to tell it's adversarial.

ttul · on Jan 30, 2018

I can’t wait to try this on my friend’s Google phone.

justme22 · on Jan 30, 2018

It won't work on google phone right now, from the paper: "The audio adversarial examples we construct in this paper do not remain adversarial after being played over-the-air, and therefore present a limited real-world threat; however, just as the initial work on image-based adversarial examples did not consider the physical channel and only later was it shown to be possible, we believe further work will be able to produce audio adversarial examples that are effective over-the-air."

However prior work [1] was shown to work on google phones, it is just much more noticeable.

[1] https://www.youtube.com/watch?v=HvZAZFztlO0

Ar-Curunir · on Jan 31, 2018

Just so you know, Nick Carlini was also on that prior work.

aeling · on Jan 30, 2018

I believe this requires the attacker to have access to the ASR neural net's weights, so Mozilla's seems like the only popular framework that's vulnerable right now (not that I'm opposed to them keeping things open).

mbebenita · on Jan 30, 2018

This is exactly why we need to keep ASR tech open. These kinds of issues affect DNNs in general, and other ASR engines based on them are also vulnerable. Having models and the data used to train them open is a great way to help academia make progress in this space.

ladberg · on Jan 30, 2018

Adversarial examples don’t actually have to be generated with the same neural network. If you want to attack another NN, you can make your own with similar training data and similar structure and the attacks can carry over to some extent.

debt · on Jan 30, 2018

Remember WEP encryption? I feel like we're at that stage with NLUI except they mostly have zero security.

svilen_dobrev · on Jan 31, 2018

would this be useful to "massage" a voice phone conversation, so the wannabe wiretappers cannot (automaticaly) decode what really is in it?

jdalgetty · on Jan 30, 2018

None of the examples triggered my google home.

jamesgeck0 · on Jan 30, 2018

See justme22's comment. https://news.ycombinator.com/item?id=16267546