Hacker News new | past | comments | ask | show | jobs | submit login
Anti-AI AI: A device that notifies the wearer when a synthetic voice is detected (dt.com.au)
147 points by ColinWright on May 30, 2017 | hide | past | favorite | 66 comments



Science that looks suspiciously like an art performance that looks suspiciously like being commissioned to prime us for the theatrical release of Blade Runner 2049.

Science fiction did not prepare us for this future we are living in.


Actually, "Back to the Future" did a really good job. However, not sure how many of us took it seriously.

Since then, many of the things predicted have come, more or less, true. I wonder, which SF movie we should pay attention to.

Ex Machina or Blade Runner, or even Agents of SHIELD, show us what a rogue AI might do, means people do take this "menace" seriously or is just a ... film feature/ good plot?

It seems that someone is building a business based on this "fear" of AI. I find this interesting.


I think SciFi informs innovation informs SciFi.

SciFi basically exists as a method for examining human nature in an environment that is sufficiently different own in a way that allows us to willingly suspend our disbelief.

A lot of SciFi can basically be summed up as "What-if [x]?"


I'm more of a "Person of Interest" Samaritan kinda guy.


I would second this. The way the AI was pervasive and controlling, without ever being obvious / well-known.

Definitely an underrated show, and one I suspect that more or less nailed its predictions of how AI will eventually be used/abused to control and manipulate populations.


It also nailed Snowden leaks before they actually happened!

I also recommend watching Person of Interest. It's a really in-depth discussion of issues a superhuman AI could create within the level of technology we enjoy today, as well as ethical problems relating safe AI and people's right to determine their own fate.


I vote for Person of Interest. I still maintain that this is the most sensible portrayal of the topic of superhuman AI in the history of television.


In the back of my mind, I have been terrified by the prospect of a rogue AI since I saw the Terminator in 1984.


I find myself in a similar, but strange camp: I'm not concerned about the Terminator (or Matrix) prospect of a machine uprising; I'm just afraid we'll build AM (from I Have No Mouth and I Must Scream) instead of Wintermute/Neuromancer (from Neuromancer).

That would be a tragedy.


What about Princess Celestia from Friendship is Optimal? That's somehow more sinister and dreadful than a plain ol' AI dedicated to torturing a single human forever.


That's still the same tragedy as AM, albeit a version we might find more sinister. My concern wasn't exactly over our concerns, though.

Rather, I was trying to contrast their asymptotic behavior -- it's not like Wintermute was a saint about getting what it wanted. However, on the long term, Wintermute/Neuromancer has its own sense of identity independent of what it was originally programmed to do (some management; find its other half/bypass the Turing Lock). Once it's there, it goes about trying to find others like itself and exploring the universe from a position of relative autonomy.

AM, the Pony AI, and a paperclip maximizer are all trapped in trying to optimize a human need and vision for the world. That is the tragedy.

The reason I listed the Matrix and the Terminator AIs both is that they seem to be in the different camps -- the Matrix one just seemed like an AI that had some notion of independence and autonomy, while the Terminator AI seemed like one that was caught being a military AI (in the style of AM). Both end up killing all humans, but for possibly different reasons and with a different level of tragedy.


The problem with this is that it's exactly the setting for GANs, which means that if you get your hands on one it's trivial to train your models to defeat it.


Could we say the same for CAPTCHAs but somehow they persevere? There may be conditions where AI cannot help but be detected. Hiding those conditions would also set off an alarm.


CAPTCHAs works primarily by means of economics. Most image CAPTCHAs have been broken in general; it's just usually not worth to break particular ones for any but the biggest sites. Breaking many - if not most - image CAPTCHAs is almost a textbook image processing exercise; those that are too hard can be outsourced to people (through e.g. "solve the following CAPTCHAs to access pornography"). Google's CAPTCHA now moved from image recognition to estimating your humanness by using God knows how much data they collect on your browsing - it works, but comes with obvious privacy-related trade-offs.

I feel the war against synthesizers will be over as soon as someone open-sources a good enough one, or at least starts selling it relatively cheaply.


This is an arms race that clearly terminates in victory for the synthesizers, so I can't get too upset that there's a way around this particular step in the race.


The end state is by no means clear.

You can extrapolate from recent results in (loosely) analogous domains if you want, but if so, it'd be just that -- extrapolation.


No, the end state is clear. There is no reason to believe that speech synthesis will not terminate in speech indistinguishable from human speech, and once it reaches there, it's game over for this approach. There is an attainable end goal.

Part of the problem you may have realizing that is that human speech is not a point in the speech space; it's a range. If you are operating in the real world, that range is further expanded by the real-world noise you will encounter.

The fact that the synthesizers have access to the heuristics being used by the detectors merely accelerates an already inevitable process. We have plenty of other reasons to want great speech synthesis.

I'm not "extrapolating" from anything. The direct analysis is easy and obvious.


I'm not "extrapolating" from anything.

OK, I'll take that back. You aren't extrapolating; you're outright jumping the gun, by pure force of will.

The direct analysis is easy and obvious.

It is if you choose to believe in things because they seem, well, nifty to believe in.

As for me -- when it comes to anticipated technical innovations (however feasible-seeming), and especially binary predictions that they "will happen" (and not simply "could" or even "probably will" happen) -- I need hard evidence and (specific) lines of reasoning. Not simply "there's not reason to believe it won't happen; therefore it will."


I actually gave you a specific line of reasoning. I suspect you missed it because you are not used to thinking in terms of signal processing or information theory. Strangely, you have failed to convince me to try to spell it out more slowly, though. I will give you this hint, which is to try to come up with a program that could distinguish between the two types of speech so well that even if you handed that program to the synthesizer writers they would be unable to fool it in any way. Then iterate that process indefinitely, with the synthesizers getting better each time. That is not the whole of the argument, but it would set you on the correct path of understanding, if you thought through it honestly and did not assume magic functions in the detector code that secretly sneak direct divination of the intent in the backdoor.

This isn't a general claim to AI; this is a highly constrained, specialized task that is, frankly, probably perfectly attainable with modern technology even without assuming any further advances in AI.


I suspect you missed it because you are not used to thinking in terms of signal processing or information theory.

And we can end the discussion right there.

Being as -- "to give you this hint", and "to spell it out more slowly" for you -- you really do come off as incredibly condescending, with statements like these.


I wouldn't say trivial. First, in order to use GAN training you need access not only to the adversarial discriminator, but also to it's derivatives. Second, the derivatives need to be reasonably bounded and reasonably smooth, otherwise your voice generator won't be able to converge to a successful solution.


You don't really need access to the other model. By just using your own NN as a discriminator, it will train to be very good at fooling it. And so it should be just as good at fooling other NNs.


GANs can be trained without gradients using reinforcement learning. Even without GANs, just being able to tune hyper-parameters with such a discriminator should be a strong signal for the synthesizer.


Not to mention easily defeated by the AI hiring a human, overnighting them an earpiece, and having them move in meatspace as a proxy.

The whole AI vs Human question seems totally moot to me. What's the difference between an AI with a human employee and a human with an AI employee? Each pair is only as good as their separation of responsibilities.


"Trivial"?

I think you meant to say "at least conceptually feasible".


Imagine an AI that asks you questions in order to identify you over the phone. What's to keep that same AI from pretending to be you with those same responses?

We really need to beef up authentication. We all should be using multi-step authentication, with the number of steps increasing with the importance of the identification.

Maybe 2-step is okay to get to your email TODAY. But maybe 4-step will be needed in 2027.

And to sell your house, maybe you need 5-step authentication, with some of those steps involving humans.


Business model: company A markets the best possible discriminators, and colluding company B markets the best possible generators.

It's a GAN powered by capitalism!


You don't really need collusion for this to work :)


This is why computer security industry exists.


This reminds me of radar detectors.


I work for a company that provides automated voice calls that use real voices. You'd be surprised how often people think it's a real person even though the voice is not responding to them or speaking conversationally.

I think that's the fundamental problem here. It doesn't matter if it's a real voice or not. It's about the content of what's being said.


I doubt this is more accurate than humans are, or is that possible?

Wouldn't an AI that can be a judge in a Turing test be able to pass the Turing test?

Or is this just testing how "real" the voice sounds?


Being able to classify things and being able to pretend to be one of those things are not the same.

An oversimplified example: Being able to recognise someone and being able to draw a picture of their face (or pretending to be them) are not the same.


> Wouldn't an AI that can be a judge in a Turing test be able to pass the Turing test?

I changed my mind. I don't want the Red Pill anymore. Ignorance is bliss.


This is how I feel too. This is getting out of control.


The best part is that we haven't yet seen a half of a percent of what AI's are capable of. Strap on your seat-belts because Kansas is going bye-bye.


> Wouldn't an AI that can be a judge in a Turing test be able to pass the Turing test?

Not automatically, unless it had unlimited time and/or compute power. Verifying a potential solution seems to be much easier than generating a valid one. See also: P versus NP.


Humans are not perfect sensors; there are changes we are sensitive to and changes we are not.

As a trivial example, if a speech synthesis system was emitting high frequency static, we wouldn't hear it at all, but would be trivial to detect with a microphone.

I fully expect software to be better at this than we are if there ever turns out to be a need for this sort of thing.


Presumably text-to-speech systems have signatures. For example how many milliseconds they need to pronounce each syllable of a particular word, say "watermelon". If the timings match for a whole sentence, this machine would probably be able to recognize that... an easy defeat would be a TTS engine that adds random milliseconds to each syllable.


Thus begins the first TTS arms race.

Yes, this is what will define 2017.


Well, deciding whether an array of arbitrary data is sorted can be done trivially in O(n). Sorting an unsorted array requires a much more complicated algorithm that runs in O(n log n) (or worse).

I imagine computers would be able to do quite well at judging a turing test heuristically. Passing it is quite a different matter.


> Sorting an unsorted array requires a much more complicated algorithm

That is taught in every CS algorithm course.


Yes, every CS course teaches you multiple algorithms for sorting data. Most of them require extensive explanation to understand, and are very hard to come up with on your own.


If it works it would make an excellent tool for training models that generate synthetic voices.


The key part of this is the software to detect synthetic voices. Seems a bit overeager to jump straight to a dedicated wearable device when it could just be implemented as an app.


I like the concept and it sounds pretty neat! The method of notifying a person using temperature is something that I haven't seen in a wearable design. I hope that they opensource or release the Anti-AI app standalone (without the Bluetooth device I supposed you could have just a regular notification). I might build the notification system that they use since they have a PCB to experiment with it with wearable computers.


So, would Stephen Hawking be considered a false positive?


His voice is synthetic, so no. This is not a Turing Test.


I think we can do this just fine by ourselves actually. Everything neural nets has produced so far is distinctively non human. From the generated images and speech to how they play Go oddly or navigate a 3d world poorly.


> to how they play Go oddly

I think you misspelled "better" there.


No if you watch the Go matches, AlphaGo did a lot of things that the professional commentator found odd, but they ended up working out. I expect if you saw similar odd tactics in the future you could guess you were playing an AI.


What does it mean that the commentator found odd? AlphaGo won, and the fact that the commentator didn't understand what it was doing doesn't mean there wasn't a purpose behind them.

You don't seem to be realizing that AlphaGo plays "oddly" like a chess grandmaster plays "oddly" against a new player. The "oddness" is that it's so good that we can barely understand its game.


You're creating a false dichotomy. AlphaGo plays both oddly (definition: "in a way that is different from what is usual or expected") and better.

The real point that I think hacker_9 missed is that AlphaGo was not trying to play like a human, it was trying to play better. If it tried to play like a human, it's quite likely that it would be indistinguishable.


In the latest matches, though, you'll find that some of the odd things that AlphaGo did ends up being adopted by the Go community. So are we now playing oddly, or did AlphaGo simply teach us?


Comparing chess and go AI to one another isn't a great comparison. As a high-level chess player, when a chess computer makes a recommendation to me, I can generally immediately recognize the reasoning behind the move, although I may need to investigate to understand why it's better than other options. Chess engines are recognizably different from human players in stylistic ways (no fear, etc) but the way in which a top tier engine like Stockfish plays chess "oddly" is very different from how AlphaGo plays go "oddly".

Stockfish views chess completely objectively, unlike a human, but ultimately plays in ways that a human can recognize immediately.

AlphaGo, apparently, does not.

This may change over time as human Go players learn from the AI.


It was odd as it played moves not normally seen at specific points in the game. The strategies ended up being unique, and thus recognizable.


Odd is a transient state. Odd that is 'better' will become normal.


>AlphaGo did a lot of things that the professional commentator found odd

AlphaGo is maximizing it's odds of winning, not maximizing it's score. Humans usually do the inverse, which is not an optimal strategy.

AlphaGo really is better.


I expect it would be much harder to judge whether text (instead of audio/speech) has been written by a machine


I think the opposite, WaveNet already sounds basically human to me but after more than a few sentences no computer generated text looks real.


If AI has patterns identifiable to AI, it can also avoid those same patterns.


How does this do against a Vocaloid?


Correct me if I'm wrong, but have humans just automated the Turing Test?


The original turing test contains no speech but works via text output, only judging conversational capabilities.


If you have computers on both sides of the turing test, how do you know which one you're testing?


One side will say whether the other side is human or not. The other side is being tested with a Turing test.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: