Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Record voice memo, receive transcription in email (whispermemos.com)
76 points by Void_ on Oct 20, 2022 | hide | past | favorite | 47 comments



This an iOS wrapper for OpenAI Whisper library.

You can use the lock screen widget to start recording - allowing you to quickly capture what's on your mind.

This comment was the inspiration behind this: https://news.ycombinator.com/item?id=32928118

Pricing is currently somewhat high, until I figure out how much it costs to run Whisper on cloud GPUs. I'll adjust the pricing accordingly, and perhaps switch to per-minute model.


This is cool! Curious if it would be possible to run the model on device?


Probably not realistic. On an M1 Pro MBP, Whisper runs far slower than real time. Think on the order of days for a 2 hour recording.

I’ve been doing transcription work for public meetings. Whisper is truly incredible in terms of error rate even in extremely challenging circumstances (obscure acronyms, unusual terms, unusual names, poor recording quality). I was seeing only a few errors per hour; most things that look like errors are in fact accurate representation of humans saying weird things. But I have to run it on my desktop with CUDA enabled. With the medium model it is iirc barely faster than real time. I only have a 1070 so maybe it is better with more modern hardware.

Whisper does also have some slightly strange behavior with silence and very long recordings. I might do a blog post once I’ve got more experience.


On M1 Pro, with Greedy decoder and medium model, I can transcribe 1 hour audio in just 10 minutes (~x6 real-time) [0].

[0] https://github.com/ggerganov/whisper.cpp


I just transcribed a 32 minute audio recording of someone doing a speech that someone recorded using their phone mic.

I used default settings of "import audio file" with the Buzz application, and it was transcribed in less than 10 minutes. 24KB text file or so.

I'm on a windows PC with AMD ryzen 3


There were at least two errors in the video demo, and that was just 15 seconds of audio. “I can take some notes from a meeting” was transcribed as “I can take some notes from meeting”, and “I click stop [recording]” ended up as “And click the stop”.


Me too - I recently ported the model to plain C/C++ and I am now planning to run it on device and see if the performance is any good. Will post an update when/if it works out


For PCs there is the buzz application https://github.com/chidiwilliams/buzz/tree/main.


I suppose the built-in iOS voice recognition would be better for that.

I haven't really compared those two properly. Wonder how much better Whisper is.


Apple will keep up with anything that SOTA, just with a bit of a lag - so just expect they will be better soon if not already

Word of warning from someone who built an SDK that filled in a processing gap that Apple had (6DOF Monocular SLAM)[1] Apple will eventually make your technology obsolete and their version will be way better. See: ARKit

We open sourced it once ARKit came out because there was no way to monetize it further

[1] https://github.com/Pair3D/PairSDK


Whisper is a game changer in terms of accuracy. It makes Zoom, YouTube, Zoom, Office/Azure, Descript, and Otter.ai transcription look like jokes in comparison.

The step change in transcription accuracy here is significant enough to cross an important threshold for usefulness.


If you're not running it on Cloud GPUs right now and it's not on device, how is it running the Whisper library now?


How did you select your GPU provider?


I like that this is so focused on quickly initiating the recording, then just emailing the transcript, rather than trying to keep me in the app. That's exactly how I'd like this type of app to function.


That's how they all start. Wait until there's VC money involved.


Haha this is just a side project - I'm co-founder of Reflect (https://reflect.app) and more likely we'll be integrating this functionality into our note-taking app.


Cool, yeah that sounds like a logical thing to do. Good luck.

I wish whispermemos was available for Android. I have to look into what's available and what the privacy implications are. Would love to use something like that. A Joplin plugin would be awesome.


On Android one can just use the Microphone input from the keyboard. Just tap and instant transcript, in whatever you like (I use a web-based knowledge thing).

Tap Icon on home screen, tap new, tap Microphone, talk for a few minutes, press save.

Why am app for a built in feature?


It going to your email is quite useful for those that have that email be the primary method of collecting work and notes. It's like that old blackberry concept of the universal inbox, everything just goes to one thing, and get's collected there.

For those with that work flow this is really cool app.


On iOS you can define a two-step Shortcut that does this (type or dictate text and then send it by email to a predefined address [0]). I imagine there are similar facilities on Android.

[0] “Ask for Input” followed by “Send Email”, deactivate “Show Compose Sheet” on the latter. Then use the microphone symbol for speech recognition on the initial text input.


This seems great! I’m excited to try it out. It reminds me of Voiceliner which I’ve been using for a while now. Voiceliner does on device transcription & allows for creating hierarchy to notes / ideas with export functions too

https://news.ycombinator.com/item?id=29726787


Damn, I wish I knew about Vosk (which Voiceliner uses) a few months ago. I was struggling with Sphinx and couldn't get it to have any accuracy. I will definitely check this out.


+1 for Voiceliner, which has been very useful for me too (heard about it via HN). It works pretty well on iPhone, haven't tested on Android.


https://recorder.google.com/ is pretty amazing to me and is accessible from mobile and web. Not sure if it's Pixel-specific or not.


It is pixel exclusive.

> Recordings backed up from the Recorder app on your Google Pixel will appear here.

Visited on a non-pixel Android device.


Oh, well, it's actually an awesome feature/app.


Let me mention Just Press Record, which performs the transcription on iOS and macOS with Apple’s speech recognition, which is on-device for several languages.

https://apps.apple.com/app/id1033342465


Seconded - one big win with Just Press Record is you can add it as a complication on an Apple Watch, so you can hit a button anywhere you are, leave a voice memo, and it's transcribed and on all your iCloud devices.


I'm trying to resist a "No wireless. Less space than a Nomad. Lame." response, but iOS already does voice-to-text, so it's hard to see what value-add this offers. I think the comment which inspired you was from an Android user? Or at least not an iOS16 user, where the voice-dictation is much improved.

Single-click launch from the lock screen? That's neat, I don't think I can get there with bare iOS.

No real-time feedback as I speak, so that seems like a point to iOS.

The email is a nice touch, but the trade-off is that you have a copy of all of my recordings now.

Finally, it could be that OpenAI Whisper does a much better job than iOS. I have no idea!


Haha, you are totally right. I was mainly trying to have fun with a new technology.

The email thing makes a lot of difference for me - I don't wanna fiddle with my phone any more than just pressing one button.

I'll do the rest on my computer. (Copying to Reflect.app, creating a todo, etc.)


It looks so cool! I'd love to have the transcript send to places other than email. I'm thinking of either a chat app like WhatsApp and Telegram or a productivity app like Notion to keep a more organized journal.


Very nice indeed. Thanks for sharing.

You mention that you use whisper on the cloud. I was wondering what you think of on-device decoding using the base.en model and tflite.

Something along the lines of:

https://colab.research.google.com/github/usefulsensors/opena...


It'd be nice if you linked to a privacy policy on that page. The policy linked to on the app store page isn't very specific, either.


Good stuff, but not for me. I need a MacOS app that uses my M1 (soon M2) and does this thing without any extra cost.


This looks cool but I think we need a tech equivalent of "...IN MICE" warning for Apple-only products on this site.

btw there are cross-platform apps which have some similar functionality, eg voiceliner

https://a9.io/voiceliner/


Subscribed to monthly while I figure out if it fits on my workflow.

Anecdotally, it provides higher accuracy in a more relaxed style of narration than Just Press Record or native dictation.

Either way, thanks for sharing. Hoping the side app is working out.


Great job! I love that you only need to press the record and stop on the phone and the rest is done automagically in the background.

I wonder if there's a thing for Android that's as accurate as Whisper!?


Somebody could build an Android frontend - and use the same OpenAI Whisper library on the server.

I didn't bother with React Native thinking audio recording would be pain in the butt.


Is there a limit on how much I can record? Is this just for quick notes or could i record a 30mins meeting? Thanks and congrats on the launch.


Currently 10m limit, but I’ll be testing more.


I do something similar with a Telegram bot... but the quality is only good enough to let me know if I should listen to that audio again.


What are the “in-app purchases”?


There’s a subscription to use it beyond 20 free memos. Running these models is pretty expensive!


I am aware of voice memo apps running a library like vosk transcribing just fine on device instead of in cloud and without cost.


Send a link, would love to try it.


https://github.com/alphacep/vosk-api

Your app is neat in that it can record from the Lock Screen. I was curious to try out the new open ai model.

Too often, iOS has a problem of too many clicks to do the most basic of things.


perfect. signed up.

nice job!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: