Whisper.cpp example running fully in the browser

sheepscreek · on Jan 23, 2023

I’ve been following along Whisper.com’s incredible progress.

It is a high quality piece of software which performs better on its intended hardware than any other implementation. It can easily be embedded anywhere. This is truly remarkable. A big shoutout to Georgi for this.

We need to remind ourselves that a part of him is choosing to give this away by open sourcing it. And he has gone through a lot of effort to make it easy to use and understand (just look at the documentation). Georgi to me, personifies every open-source author who put in their sweat and toil towards something that benefits our entire community.

Thank you Georgi. Salut, my friend!

johnmaguire · on Jan 23, 2023

I don't know what Whisper.com is but it doesn't seem related to this project.

remram · on Jan 23, 2023

Probably autocorrect (cpp->com)

sheepscreek · on Jan 24, 2023

Thanks - that’s right!

mcemilg · on Jan 23, 2023

I recently laid off and currently trying to build some apps that could create some revenue that can afford my costs for some weeks. I built a transcription and dictation app for Mac [0] using whisper.cpp, small model works really well on 2019 mpb and m1 for streaming (dictation). It was really straight forward to use, however the streaming algorithm doesn't ready for production so I implement my own algorithm using VAD. I believe in that with that pace this could also be fixed.

[0] https://apple.co/3j2k8E7

mrtksn · on Jan 23, 2023

I hope you get some success with your app. Speaking of, how is it doing? I keep hearing about mac AppStore not being very good but how is your experience? Do you get downloads and revenue?

mcemilg · on Jan 23, 2023

Thanks for your good wishes. It's not going well actually, the downloads are very few after 10 days and no paying customers. The App Store seems like a desert, even there are not many competitors in transcription, still the app impressions and page views are very few. What I see that people trying to do marketing outside the app store.

robterrell · on Jan 23, 2023

The current crop of transcription apps are truly awful. Trying yours now!

One suggestion -- the trial is difficult to use if I have a long file, because it won't let me select a file that is longer than 5 min. Maybe let me pick a longer file but only transcribe the first 5 minutes? Most people aren't going to edit their files to accommodate your trial mode.

mcemilg · on Jan 23, 2023

Hey, thank you so much for trying the app. I will improve the free trial feature, especially you are right for the 5 minutes limit. I go with the most basic solution first, but I will let to pick longer files on next release. Thanks!

trafnar · on Jan 23, 2023

This is a cool app I was excited to try it. Couple points of feedback:

- There is a popup every few seconds encouraging me to pay. Let me at least try it for a minute or two? I've seen the popup about 5 times while trying the app for just a couple minutes.

- Use the system light/dark mode instead of defaulting to dark mode

- Show language names instead of codes "EN" etc.

mcemilg · on Jan 23, 2023

Hey, thank you for trying the app. These suggestions are very value for me. Currently, I am working on the paywall stuff, at the begging I go with the most basic solution. Now I will make it more useful for free users.

boundlessdreamz · on Jan 24, 2023

Just cap the max amount of minutes they can transcribe in a day or total. Then you will be asking only the people who find real value in it

urbandw311er · on Jan 23, 2023

Seemed impressive enough to me, but I don't know what the current best-in-class looks like these days. Can anybody working in this area explain if this is a significant milestone and what opportunities it might unlock? The consumer value proposition of basic speech-to-text input seems to be well-handled by most major OS's, but I appreciate that's proprietary tech and only one use case.

kevinak · on Jan 23, 2023

Whisper is way better than at least some other transcription solutions: https://blog.lopp.net/open-source-transcription-software-com...

gary_0 · on Jan 23, 2023

Wow, near-perfect transcription on desktop Firefox! Didn't seem to work on Android Chrome, though.

I wonder if this can be sped up using WebGPU...

7373737373 · on Jan 23, 2023

Does anyone know of a real time version of this, that can immediately transcribe individual words? Could be very useful for those hard of hearing.

ggerganov · on Jan 23, 2023

There is an attempt for real-time transcription from the microphone at:

https://whisper.ggerganov.com/stream/

It chunks the input in 5 second buffers and processes them independently. Results and performance are not great, but I think good enough for a proof-of-concept.

TickleSteve · on Jan 23, 2023

Does it have to be open source? If not, the Pixel phones do an incredible job of real-time transcribing.

7373737373 · on Jan 24, 2023

Yeah, and local only processing

zachlatta · on Jan 24, 2023

I highly recommend trying out https://whisper.ggerganov.com/talk/. It lets you talk to GPT-2 using your voice, all running locally in your browser. Holy cow.

cloudking · on Jan 23, 2023

Very cool, it works for videos too. Parsed a 1 minute video with ~95% transcription accuracy

samanator · on Jan 23, 2023

This is incredible! Thank you for sharing! Did OpenAI release these pretrained models, or was the training done separately alone with this project?

Of OpenAI releases the pretrained models, why would we use their service?

ycombinete · on Jan 23, 2023

They are pre-trained. This project is running a port of the original open AI release [0] to C++.

From the OpenAI paper and release notes: "We are releasing models and inference code to serve as a foundation for further work on robust speech processing." So I guess they are either truly altruistic in this, or they are planning on monetising whatever they build on top of it.

Also OpenAI is a startup (if we can call it that) so their value right now is more about being impressive, and looking a lot like future value; as opposed to showing an immediate route to profit.

[0] https://github.com/openai/whisper

edtechdev · on Jan 23, 2023

Would be interesting to see this connected to YouTube, to improve upon their auto generated transcripts. There is this command line version using YouTube-dl and OpenAI's API https://simonwillison.net/2022/Sep/30/action-transcription/

lsb · on Jan 23, 2023

Running in the latest safari iPhone browser I get the error:

failed to asynchronously prepare wasm: CompileError: WebAssembly.Module doesn't parse at byte 5: can't get Function local's type in group 1, in function at index 9 Aborted(CompileError: WebAssembly.Module doesn't parse at byte 5: can't get Function local's type in group 1, in function at index 9)

ycombinete · on Jan 23, 2023

from the web page: "Important: your browser must support WASM SIMD instructions for this to work."

I don't think Safar IOS supports this. [0]

[0] https://developer.apple.com/forums/thread/693067

quickthrower2 · on Jan 23, 2023

Me too, same browser

FloatArtifact · on Jan 24, 2023

My understanding is each inference run requires 30 seconds. Therefore anything processed process under 30 seconds is padded out with silence.

To my knowledge, nobody's been able to work around this and it may not be possible without work. Upstream.

raybb · on Jan 23, 2023

If someone wants to self host you can also try this decent web interface: https://codeberg.org/pluja/web-whisper

I'm not the creator, just a fan.

jonatron · on Jan 23, 2023

This might help out the timestamp guy for very long videos/podcasts.

sheerun · on Jan 23, 2023

I think we should make standard browser API for transcribing, otherwise each website wanting to implement private voice recognition will need to download 500MB of data

NVI · on Jan 23, 2023

Perhaps we should call it Web Speech API.

NVI · on Jan 23, 2023

I was trolling, sorry.

This API already exists. It isn't nearly as good as Whisper.cpp (at least on macOS).

Docs: https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecog...

Demo: https://codepen.io/Rumyra/pen/NWLyLe

dessant · on Jan 23, 2023

An important limitation of Web Speech API is that it only accepts audio from a microphone, you can't transcribe an audio file or a WebRTC call.

Semaphor · on Jan 23, 2023

FF Win for the small model: Uncaught DOMException: IDBObjectStore.put: The serialized value is too large (size=487614318 bytes, max=267386880 bytes).

danielovichdk · on Jan 23, 2023

English model is really good. My native language Danish, not so much

jahnu · on Jan 23, 2023

Three clicks to find out what it is:

1: “Minimal whisper.cpp example running fully in the browser”

2: “Port of OpenAI's Whisper model in C/C++”

3: “Whisper is a general-purpose speech recognition model.”

visarga · on Jan 23, 2023

I think many people here are familiar with Whisper. On the speech recognition task it is the most exciting news in the last year.