Hacker News new | past | comments | ask | show | jobs | submit login

Check out Kaldi. It's a toolkit rather than a ready-to-deploy service but has some solid pretrained models and recipes for training your own. You can use various existing projects for deployment, e.g. vosk-server (also for on-device) which comes with models for various languages and accents and has an excellent support channel via telegram. Quite frankly, despite not being "end-to-end", you'll get much much better results in practice.



I collected custom audio and had it transcribed by hand for cash, then evaluated it on wav2letter and vosk. At least for that domain, wav2letter outperforms vosk.


Good for you, it's the only way to know which tool works best in your case. I did the same for my use case and arrived at the opposite conclusion.

What most people don't realize is that it heavily depends on your use case and domain whether any given model/algorithm will work better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: