Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Slightly OT:

I have been playing around with whisper.cpp; it's nice because I can run the large model (quantized to 8-bits) at roughly real-time with cublas on a Ryzen 2700 with a 1050Ti. I couldn't even run the pytorch whisper medium on this card with X11 also running.

It blows me away that I can get real-time speech-to-text of this quality on a machine that is almost 5 years old.



Seconded. I were playing around for my native language (Polish) and the large models actually blew me away. For example, it handled "przescreenować" spelling correctly, which is an english word with a polish prefix and a conjugated suffix.


is there any dummy guide to get started with any of these?


Have you tried the Quick start in the https://github.com/ggerganov/whisper.cpp README?


This is an impresdive use case


Is it possible to run on apple m1 devices or mobile phones or not yet?


I can recommend the MacWhisper app if you prefer a gui.


And Whisper Memos for iOS https://whispermemos.com/


The really nice part of Whisper is being able to use it offline and on-device, it seems whisper memos is uploading your audio and notes to a server of unknown security, confidentiality etc.

I like Aiko for on-device transcription both in macOS and iOS https://apps.apple.com/us/app/aiko/id1672085276


Whisper Memos uses OpenAI API. The upside is that it uses the largest model - that would take 2GB on your iPhone.


yeah the whisper.cpp github page has a demo for both. Have used it on my M1 MBA for the past few months.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: