Example 1, if we're referring to the likes of Siri and Alexa isn't thanks to imp...

stephengillie · on Feb 8, 2017

Not sure if Siri and Alexa (and Google Now) send every recording to a data center - the .NET speech recognition libraries ship with Windows, so all audio data stays local to your PC, afaik. I'd expect Cortana leverages these libraries as well, instead of sending all data to a remote server.

You can build basic functionality into a speech bot in Powershell ("PowerShiri, what time is it?" "PowerShiri, what is the weather?") as a weekend project: https://news.ycombinator.com/item?id=11663029

SyneRyder · on Feb 8, 2017

At least with Google Now, you can go into your Google Account and listen to recordings of everything you've ever asked via voice search:

https://myactivity.google.com/item?product=29

(If the Product 29 parameter doesn't work, click on "Item View" in the left and filter by "Voice & Audio".)

seanmcdirmid · on Feb 8, 2017

Cortana sends it off to a server like everyone else. Those libraries are probably not Microsoft's latest and greatest.

Modern speech models are quite big, not so big that you couldn't load it on your desktop, but big enough that you would notice. Couple that wih the fact that search/or a service is going to happen on a server anyways, client side processing doesn't make sense beyond a few functions.

dx034 · on Feb 8, 2017

Alexa can only recognize the keywords. Which is also why you can only choose between 3, because the device isn't able to detect anything else. After that, the command will be analyzed in Amazon's datacenter.

I don't know of any smartphone or similar device that would interpret voice commands locally.

tuxracer · on Feb 8, 2017

Android has had face unlock built in for some time. A naive way it mitigates the "unlock with photo" vulnerability is an option requiring you to blink. Not as robust as 3d, heat sensitive cameras but it's at least not a trivial as showing a photo to beat it.

The impressive amount of processing power available on many smartphones today certainly contributes to this being a practical unlock method.

lorenzhs · on Feb 8, 2017

You're right, you also need to sweep across the eyes with a pencil to defeat that. In case you're interested in more detail, Google starbug's excellent talk "ich sehe, also bin ich... du" (there's an English translation).

ralfd · on Feb 9, 2017

Is face unlock fooled by video of a person blinking?

happyopossum · on Feb 8, 2017

Siri sends data to the cloud for processing what you're asking for, but the actual speech-to-text can work locally these days. Try it - out your iPhone in airplane mode and fire up dictation - works like a charm.

lostlogin · on Feb 8, 2017

Can I ask what you would dictate that doesn't need an internet connection? A note or a shopping list come to mind, but that's it for me.

accurrent · on Feb 8, 2017

Actually I hate the fact that google sends my voice over the network , not because of privacy but because I have to pay for the additional Data sent.

semi-extrinsic · on Feb 8, 2017

How much do you pay for data, and how much do you dictate, for this to be an actual gripe?

I assume Google encodes the data in something like iSAC [1], requiring 32 kbit/s for good quality speech, so an hour of dictating is 3600 x 32/(8 x 1024) = 14 MB.

[1] https://en.m.wikipedia.org/wiki/Internet_Speech_Audio_Codec

_hhkc · on Feb 8, 2017

Dictation in iOS9 and later can be used without an internet connection on newer iPhones.

faitswulff · on Feb 8, 2017

RE example 1, there is offline voice recognition, though.