Hacker News new | past | comments | ask | show | jobs | submit login

Example 1, if we're referring to the likes of Siri and Alexa isn't thanks to improvements in personal computer technology - said platforms send your speech recording off to a massive datacenter for processing, no need for 10Ghz processors there.

Example 2 requires the use of depth-sensing and heat-sensitive cameras to avoid trivial "show a photo of an authorized user" attacks - that's not really CPU-dependent either.




Not sure if Siri and Alexa (and Google Now) send every recording to a data center - the .NET speech recognition libraries ship with Windows, so all audio data stays local to your PC, afaik. I'd expect Cortana leverages these libraries as well, instead of sending all data to a remote server.

You can build basic functionality into a speech bot in Powershell ("PowerShiri, what time is it?" "PowerShiri, what is the weather?") as a weekend project: https://news.ycombinator.com/item?id=11663029


At least with Google Now, you can go into your Google Account and listen to recordings of everything you've ever asked via voice search:

https://myactivity.google.com/item?product=29

(If the Product 29 parameter doesn't work, click on "Item View" in the left and filter by "Voice & Audio".)


Cortana sends it off to a server like everyone else. Those libraries are probably not Microsoft's latest and greatest.

Modern speech models are quite big, not so big that you couldn't load it on your desktop, but big enough that you would notice. Couple that wih the fact that search/or a service is going to happen on a server anyways, client side processing doesn't make sense beyond a few functions.


Alexa can only recognize the keywords. Which is also why you can only choose between 3, because the device isn't able to detect anything else. After that, the command will be analyzed in Amazon's datacenter.

I don't know of any smartphone or similar device that would interpret voice commands locally.


Android has had face unlock built in for some time. A naive way it mitigates the "unlock with photo" vulnerability is an option requiring you to blink. Not as robust as 3d, heat sensitive cameras but it's at least not a trivial as showing a photo to beat it.

The impressive amount of processing power available on many smartphones today certainly contributes to this being a practical unlock method.


You're right, you also need to sweep across the eyes with a pencil to defeat that. In case you're interested in more detail, Google starbug's excellent talk "ich sehe, also bin ich... du" (there's an English translation).


Is face unlock fooled by video of a person blinking?


Siri sends data to the cloud for processing what you're asking for, but the actual speech-to-text can work locally these days. Try it - out your iPhone in airplane mode and fire up dictation - works like a charm.


Can I ask what you would dictate that doesn't need an internet connection? A note or a shopping list come to mind, but that's it for me.


Actually I hate the fact that google sends my voice over the network , not because of privacy but because I have to pay for the additional Data sent.


How much do you pay for data, and how much do you dictate, for this to be an actual gripe?

I assume Google encodes the data in something like iSAC [1], requiring 32 kbit/s for good quality speech, so an hour of dictating is 3600 x 32/(8 x 1024) = 14 MB.

[1] https://en.m.wikipedia.org/wiki/Internet_Speech_Audio_Codec


Dictation in iOS9 and later can be used without an internet connection on newer iPhones.


RE example 1, there is offline voice recognition, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: