They might be interested in integrating Vosk, it's a speech-to-text engine that is just a shared library (.so file on Linux) and comes with API support for a variety of languages:
Still, I've found that the Big players have much better recognition models, and the post-processing that I assume they do (grammatical, maybe syntactical inferences that improve the end result) are probably much more powerful too.
https://alphacephei.com/vosk/
https://github.com/alphacep/vosk-api
Still, I've found that the Big players have much better recognition models, and the post-processing that I assume they do (grammatical, maybe syntactical inferences that improve the end result) are probably much more powerful too.