Any reason why not to use Kaldi (https://github.com/kaldi-asr/kaldi)? AFAIK it u...

jandrese · on June 2, 2016

Frankly, Kaldi is nearly impossible for mere mortals to use. It's 100% targeted at people doing PhD work in speech recognition who have a colleague who already knows how it works and can set it up for them.

IMHO, there is a big opportunity for someone to come along and repackage it in a user friendly way, but the people who actually understand it are too busy doing "real work" to bother with such frivolity.

squeaky-clean · on June 2, 2016

Like others are saying, it's just much harder to use. The official tutorial even says "The intended audience for this tutorial is either speech recognition researchers, or graduates or advanced undergraduates who are studying this area anyway." in the first paragraph. It seems like Kaldi is meant for people who actually know how speech recognition works, while other tools are meant for people who just want some text from some audio without really understanding how.

For example, I've been playing with home automation and speech recognition, and have been able to get any Sphinx based recognizer working in a single sitting, in a few hours or less. But I've yet to get Kaldi working yet after a several nights of effort. It seems much more powerful, and based on my reading, it's more accurate than Sphinx. But that doesn't do me any good if I can't get it to run, haha.

kleiba · on June 2, 2016

I was thinking the same. Kaldi's documentation is a bit lacking, and it's non-trivial to use besides their provided "recipes".

roel_v · on June 2, 2016

It's (a lot) more difficult to set up.