I’ve been following along Whisper.com’s incredible progress.
It is a high quality piece of software which performs better on its intended hardware than any other implementation. It can easily be embedded anywhere. This is truly remarkable. A big shoutout to Georgi for this.
We need to remind ourselves that a part of him is choosing to give this away by open sourcing it. And he has gone through a lot of effort to make it easy to use and understand (just look at the documentation). Georgi to me, personifies every open-source author who put in their sweat and toil towards something that benefits our entire community.
I recently laid off and currently trying to build some apps that could create some revenue that can afford my costs for some weeks. I built a transcription and dictation app for Mac [0] using whisper.cpp, small model works really well on 2019 mpb and m1 for streaming (dictation). It was really straight forward to use, however the streaming algorithm doesn't ready for production so I implement my own algorithm using VAD. I believe in that with that pace this could also be fixed.
I hope you get some success with your app. Speaking of, how is it doing? I keep hearing about mac AppStore not being very good but how is your experience? Do you get downloads and revenue?
Thanks for your good wishes. It's not going well actually, the downloads are very few after 10 days and no paying customers. The App Store seems like a desert, even there are not many competitors in transcription, still the app impressions and page views are very few. What I see that people trying to do marketing outside the app store.
The current crop of transcription apps are truly awful. Trying yours now!
One suggestion -- the trial is difficult to use if I have a long file, because it won't let me select a file that is longer than 5 min. Maybe let me pick a longer file but only transcribe the first 5 minutes? Most people aren't going to edit their files to accommodate your trial mode.
Hey, thank you so much for trying the app. I will improve the free trial feature, especially you are right for the 5 minutes limit. I go with the most basic solution first, but I will let to pick longer files on next release. Thanks!
This is a cool app I was excited to try it. Couple points of feedback:
- There is a popup every few seconds encouraging me to pay. Let me at least try it for a minute or two? I've seen the popup about 5 times while trying the app for just a couple minutes.
- Use the system light/dark mode instead of defaulting to dark mode
Hey, thank you for trying the app. These suggestions are very value for me. Currently, I am working on the paywall stuff, at the begging I go with the most basic solution. Now I will make it more useful for free users.
Seemed impressive enough to me, but I don't know what the current best-in-class looks like these days. Can anybody working in this area explain if this is a significant milestone and what opportunities it might unlock? The consumer value proposition of basic speech-to-text input seems to be well-handled by most major OS's, but I appreciate that's proprietary tech and only one use case.
It chunks the input in 5 second buffers and processes them independently. Results and performance are not great, but I think good enough for a proof-of-concept.
I highly recommend trying out https://whisper.ggerganov.com/talk/. It lets you talk to GPT-2 using your voice, all running locally in your browser. Holy cow.
They are pre-trained. This project is running a port of the original open AI release [0] to C++.
From the OpenAI paper and release notes: "We are releasing models and inference code to serve as a foundation for further work on robust speech processing." So I guess they are either truly altruistic in this, or they are planning on monetising whatever they build on top of it.
Also OpenAI is a startup (if we can call it that) so their value right now is more about being impressive, and looking a lot like future value; as opposed to showing an immediate route to profit.
Running in the latest safari iPhone browser I get the error:
failed to asynchronously prepare wasm: CompileError: WebAssembly.Module doesn't parse at byte 5: can't get Function local's type in group 1, in function at index 9
Aborted(CompileError: WebAssembly.Module doesn't parse at byte 5: can't get Function local's type in group 1, in function at index 9)
I think we should make standard browser API for transcribing, otherwise each website wanting to implement private voice recognition will need to download 500MB of data
It is a high quality piece of software which performs better on its intended hardware than any other implementation. It can easily be embedded anywhere. This is truly remarkable. A big shoutout to Georgi for this.
We need to remind ourselves that a part of him is choosing to give this away by open sourcing it. And he has gone through a lot of effort to make it easy to use and understand (just look at the documentation). Georgi to me, personifies every open-source author who put in their sweat and toil towards something that benefits our entire community.
Thank you Georgi. Salut, my friend!