> open-source or paid SDK/API that I can use to create a group voice chat mobile app with "live" transcription? Or something that can plug-in to a system like this?
Yes, Google, Amazon, Microsoft all offer streaming solutions (wouldn't recommend Amazon's however, might recommend Microsoft over Google). wav2letter from FB is the only open-source framework worth looking at, deepspeech is not a seriously usable framework.
Check out Kaldi. It's a toolkit rather than a ready-to-deploy service but has some solid pretrained models and recipes for training your own. You can use various existing projects for deployment, e.g. vosk-server (also for on-device) which comes with models for various languages and accents and has an excellent support channel via telegram. Quite frankly, despite not being "end-to-end", you'll get much much better results in practice.
I collected custom audio and had it transcribed by hand for cash, then evaluated it on wav2letter and vosk. At least for that domain, wav2letter outperforms vosk.
Yes, Google, Amazon, Microsoft all offer streaming solutions (wouldn't recommend Amazon's however, might recommend Microsoft over Google). wav2letter from FB is the only open-source framework worth looking at, deepspeech is not a seriously usable framework.