Yeah, but it'd be like getting a blood test and trying to interpret it yourself ...

glup · on Aug 7, 2018

I think this is a great summary of where we are at now, but I think stronger broad-coverage language models (=expectations for what people say, better generative models of speakers) are feasible (for a couple 10s-100s of millions in R&D) to bring ASR up to parity with people. It's pretty clear we are getting close to the limits of what acoustics can offer, and it's the language model that is the next frontier both in terms of accuracy and real-time performance.

sonnyblarney · on Aug 7, 2018

Agreed. We are getting close to a nice reality as all of the pieces are getting better.

jdietrich · on Aug 7, 2018

For speakers of non-standard English, a quick test might be a useful sanity check. Many speech-to-text algorithms fail catastrophically with certain accents.