I have been playing around with whisper.cpp; it's nice because I can run the large model (quantized to 8-bits) at roughly real-time with cublas on a Ryzen 2700 with a 1050Ti. I couldn't even run the pytorch whisper medium on this card with X11 also running.
It blows me away that I can get real-time speech-to-text of this quality on a machine that is almost 5 years old.
Seconded. I were playing around for my native language (Polish) and the large models actually blew me away. For example, it handled "przescreenować" spelling correctly, which is an english word with a polish prefix and a conjugated suffix.
The really nice part of Whisper is being able to use it offline and on-device, it seems whisper memos is uploading your audio and notes to a server of unknown security, confidentiality etc.
I have been playing around with whisper.cpp; it's nice because I can run the large model (quantized to 8-bits) at roughly real-time with cublas on a Ryzen 2700 with a 1050Ti. I couldn't even run the pytorch whisper medium on this card with X11 also running.
It blows me away that I can get real-time speech-to-text of this quality on a machine that is almost 5 years old.