Hacker News new | past | comments | ask | show | jobs | submit login

> open-source or paid SDK/API that I can use to create a group voice chat mobile app with "live" transcription? Or something that can plug-in to a system like this?

Yes, Google, Amazon, Microsoft all offer streaming solutions (wouldn't recommend Amazon's however, might recommend Microsoft over Google). wav2letter from FB is the only open-source framework worth looking at, deepspeech is not a seriously usable framework.




Check out Kaldi. It's a toolkit rather than a ready-to-deploy service but has some solid pretrained models and recipes for training your own. You can use various existing projects for deployment, e.g. vosk-server (also for on-device) which comes with models for various languages and accents and has an excellent support channel via telegram. Quite frankly, despite not being "end-to-end", you'll get much much better results in practice.


I collected custom audio and had it transcribed by hand for cash, then evaluated it on wav2letter and vosk. At least for that domain, wav2letter outperforms vosk.


Good for you, it's the only way to know which tool works best in your case. I did the same for my use case and arrived at the opposite conclusion.

What most people don't realize is that it heavily depends on your use case and domain whether any given model/algorithm will work better.


Curious why would you not recommend Amazon... is it cost or something else.


For my use case, quality subpar compared to the other cloud providers




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: