I’m really interested in this project too. Been thinking about similar solutions for a while now.
I looked into Kaldi and Mozilla Deep Speech but the former seems geared at ASR experts and the latter didn’t seem suited for my particular application (longer recorded audio or real time stream)
Mozilla DeepSpeech has streaming audio support as of a few releases ago, the word error rate has also improved.
I would recommend looking at Vosk too, it converts speech to text much faster than Mozilla DeepSpeech while having slightly better results: https://alphacephei.com/vosk/
I looked into Kaldi and Mozilla Deep Speech but the former seems geared at ASR experts and the latter didn’t seem suited for my particular application (longer recorded audio or real time stream)