Deep Speech: Scaling up end-to-end speech recognition

dthal · on Feb 14, 2015

I'll kick things off by linking to the comments from the last time this was posted: https://news.ycombinator.com/item?id=8769067.

natch · on Feb 15, 2015

>16.0% error on the full test set

Does anyone know, what was the error rate with previous approaches these days?

woodson · on Feb 15, 2015

Look at Table 3 in the paper. Also, 16.5% error ;)

natch · on Feb 16, 2015

Oh thanks, and, wow!

infocollector · on Feb 14, 2015

Github repo that I can compile and try?

woodson · on Feb 15, 2015

Providing the source would be a step to improve transparency and reproducibility (the text does not provide sufficient detail for even someone working in the field to reproduce what they did so that he would arrive at the same results); however, the more crucial thing is the data. Switchboard, Fisher, and WSJ are available (provided you have a few grand to spend), but they say they collected 5000 h of read speech from 9600 speakers.. That's a huge effort!

xai3luGi · on Feb 15, 2015

An alternative source of data that you can contribute to:

http://www.voxforge.org/

egfx · on Feb 15, 2015