Hacker News new | past | comments | ask | show | jobs | submit login

Any reason why not to use Kaldi (https://github.com/kaldi-asr/kaldi)? AFAIK it uses most of the state of the art algorithms.



Frankly, Kaldi is nearly impossible for mere mortals to use. It's 100% targeted at people doing PhD work in speech recognition who have a colleague who already knows how it works and can set it up for them.

IMHO, there is a big opportunity for someone to come along and repackage it in a user friendly way, but the people who actually understand it are too busy doing "real work" to bother with such frivolity.


Like others are saying, it's just much harder to use. The official tutorial even says "The intended audience for this tutorial is either speech recognition researchers, or graduates or advanced undergraduates who are studying this area anyway." in the first paragraph. It seems like Kaldi is meant for people who actually know how speech recognition works, while other tools are meant for people who just want some text from some audio without really understanding how.

For example, I've been playing with home automation and speech recognition, and have been able to get any Sphinx based recognizer working in a single sitting, in a few hours or less. But I've yet to get Kaldi working yet after a several nights of effort. It seems much more powerful, and based on my reading, it's more accurate than Sphinx. But that doesn't do me any good if I can't get it to run, haha.


I was thinking the same. Kaldi's documentation is a bit lacking, and it's non-trivial to use besides their provided "recipes".


It's (a lot) more difficult to set up.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: