> open-source or paid SDK/API that I can use to create a group voice chat mobile...

woodson · on Nov 15, 2020

Check out Kaldi. It's a toolkit rather than a ready-to-deploy service but has some solid pretrained models and recipes for training your own. You can use various existing projects for deployment, e.g. vosk-server (also for on-device) which comes with models for various languages and accents and has an excellent support channel via telegram. Quite frankly, despite not being "end-to-end", you'll get much much better results in practice.

whimsicalism · on Nov 15, 2020

I collected custom audio and had it transcribed by hand for cash, then evaluated it on wav2letter and vosk. At least for that domain, wav2letter outperforms vosk.

woodson · on Nov 15, 2020

Good for you, it's the only way to know which tool works best in your case. I did the same for my use case and arrived at the opposite conclusion.

What most people don't realize is that it heavily depends on your use case and domain whether any given model/algorithm will work better.

deskamess · on Nov 16, 2020

Curious why would you not recommend Amazon... is it cost or something else.

whimsicalism · on Nov 16, 2020

For my use case, quality subpar compared to the other cloud providers