The feature I want is speaker differentiation - I want to feed in an audio file ...

runeb · 2025-03-20T20:10:53 1742501453

I had good results with pyannote and the following model for that use case in the past https://huggingface.co/pyannote/speaker-diarization-3.1

infecto · 2025-03-20T20:24:28 1742502268

I thought Deepgram already did speaker diarization (which is differentiation) pretty well. That and it can include timestamps plus other metadata.

thot_experiment · 2025-03-20T21:39:18 1742506758

WhisperX does all of this, I use it all the time to transcribe meeting notes. Both speaker differentiation and individual word timestamps.