Whisper works ... kinda. I'm hoping there's another set of models released at some point, the error rate isn't appalling to me because i am transcribing TV shows and radio shows for personal use, so it's not mission critical.
There are a few whisper diarization "projects" but i've never been able to get it to work. Whisper does have word-level timestamps, so it should be simple to "plug in" diarization.
I don't need an LLM or whatever this project has, but i will see if it's runnable and if it's any better than what a couple podcasts i listen to use.
edit: see some people mentioning whisperx, which is one of those things that was cool until moving fast broke things:
>As of Oct 11, 2023, there is a known issue regarding slow performance with pyannote/Speaker-Diarization-3.0 in whisperX. It is due to dependency conflicts between faster-whisper and pyannote-audio 3.0.0. Please see this issue for more details and potential workarounds.
which means that what i gain is a ~3x increase in large-v2 speeds but i instantly lose those gains with diarization, unless i track down 8 month old bug workarounds.
I'll stick with the py venv whisper install i've been using for the last 16 months, tyvm
30 years of audio that needs transcribing, summaries, and worksheets made out of them.