Sphinx is pretty awful (remember the time before good speech recognition existed...

dharma1 · on June 2, 2016

https://github.com/alumae/kaldi-gstreamer-server

Takes the pain out of it

IshKebab · on June 2, 2016

Ah interesting link, I hadn't seen that.

amelius · on June 2, 2016

> None of the open source speech recognition systems (or commercial for that matter) come close to Google.

Is that because of the data they have, or because of their superior algorithms?

kuschku · on June 2, 2016

It’s because they used data from the public to train their models.

If, suddenly, someone would apply the fact that copyright bans remixes to training of neural networks, and apply the fact that licenses for this have to be granted explicitly, Google would lose 90% of their advantage over other companies.

Personally, I’d be for making a requirement that companies open source their trained models if the training data contained data supplied by users, not paid employees.

davexunit · on June 2, 2016

I think the world needs the equivalent of OpenStreetMap but for speech data, so that the data is under a copyleft license that legally enforces reciprocation when the corpus is used or modified.

ashitlerferad · on June 2, 2016

The closest is VoxForge:

http://voxforge.org/

davexunit · on June 2, 2016

Didn't know about VoxForge. Thanks!

amelius · on June 2, 2016

I sympathize, but good luck with that :)

IshKebab · on June 2, 2016

Both I think, but mostly the data. Baidu's deep speech is meant to be very good and its design is public (they even open sourced one component of it).