Voice interfaces are likely to be a big part of our future, and in order to have...

Narkov · on May 24, 2018

I don't think there is much benefit for the ML models when you have 10m vs say, 15m devices.

The real reason is the land-grab. People are unlikely to have multiple vendors devices.

lallysingh · on May 24, 2018

I agree on the land grab. But for data, powers of ten are what we're talking about. Accents, regional phrasing, idioms, and mixing different languages together.

The models are moderately good at basic speech to text and some grammatical parsing. But there is a lot more to go. The simplicity of the command set isn't an indicator of their aspirations. You'd eventually want a system capable of understanding any utterance by any human, at least as good as any other human. And certainly not just in command syntax.