Show HN: Record voice memo, receive transcription in email

Void_ · on Oct 20, 2022

This an iOS wrapper for OpenAI Whisper library.

You can use the lock screen widget to start recording - allowing you to quickly capture what's on your mind.

This comment was the inspiration behind this: https://news.ycombinator.com/item?id=32928118

Pricing is currently somewhat high, until I figure out how much it costs to run Whisper on cloud GPUs. I'll adjust the pricing accordingly, and perhaps switch to per-minute model.

alexarena · on Oct 20, 2022

This is cool! Curious if it would be possible to run the model on device?

n8cpdx · on Oct 20, 2022

Probably not realistic. On an M1 Pro MBP, Whisper runs far slower than real time. Think on the order of days for a 2 hour recording.

I’ve been doing transcription work for public meetings. Whisper is truly incredible in terms of error rate even in extremely challenging circumstances (obscure acronyms, unusual terms, unusual names, poor recording quality). I was seeing only a few errors per hour; most things that look like errors are in fact accurate representation of humans saying weird things. But I have to run it on my desktop with CUDA enabled. With the medium model it is iirc barely faster than real time. I only have a 1070 so maybe it is better with more modern hardware.

Whisper does also have some slightly strange behavior with silence and very long recordings. I might do a blog post once I’ve got more experience.

ggerganov · on Oct 20, 2022

On M1 Pro, with Greedy decoder and medium model, I can transcribe 1 hour audio in just 10 minutes (~x6 real-time) [0].

[0] https://github.com/ggerganov/whisper.cpp

Multicomp · on Oct 20, 2022

I just transcribed a 32 minute audio recording of someone doing a speech that someone recorded using their phone mic.

I used default settings of "import audio file" with the Buzz application, and it was transcribed in less than 10 minutes. 24KB text file or so.

I'm on a windows PC with AMD ryzen 3

tobr · on Oct 20, 2022

There were at least two errors in the video demo, and that was just 15 seconds of audio. “I can take some notes from a meeting” was transcribed as “I can take some notes from meeting”, and “I click stop [recording]” ended up as “And click the stop”.

ggerganov · on Oct 20, 2022

Me too - I recently ported the model to plain C/C++ and I am now planning to run it on device and see if the performance is any good. Will post an update when/if it works out

Multicomp · on Oct 20, 2022

For PCs there is the buzz application https://github.com/chidiwilliams/buzz/tree/main.

Void_ · on Oct 20, 2022

I suppose the built-in iOS voice recognition would be better for that.

I haven't really compared those two properly. Wonder how much better Whisper is.

AndrewKemendo · on Oct 20, 2022

Apple will keep up with anything that SOTA, just with a bit of a lag - so just expect they will be better soon if not already

Word of warning from someone who built an SDK that filled in a processing gap that Apple had (6DOF Monocular SLAM)[1] Apple will eventually make your technology obsolete and their version will be way better. See: ARKit

We open sourced it once ARKit came out because there was no way to monetize it further

[1] https://github.com/Pair3D/PairSDK

n8cpdx · on Oct 20, 2022

Whisper is a game changer in terms of accuracy. It makes Zoom, YouTube, Zoom, Office/Azure, Descript, and Otter.ai transcription look like jokes in comparison.

The step change in transcription accuracy here is significant enough to cross an important threshold for usefulness.

rexreed · on Oct 20, 2022

If you're not running it on Cloud GPUs right now and it's not on device, how is it running the Whisper library now?

O__________O · on Oct 20, 2022

How did you select your GPU provider?

trafnar · on Oct 20, 2022

I like that this is so focused on quickly initiating the recording, then just emailing the transcript, rather than trying to keep me in the app. That's exactly how I'd like this type of app to function.

barbazoo · on Oct 20, 2022

That's how they all start. Wait until there's VC money involved.

Void_ · on Oct 20, 2022

Haha this is just a side project - I'm co-founder of Reflect (https://reflect.app) and more likely we'll be integrating this functionality into our note-taking app.

barbazoo · on Oct 20, 2022

Cool, yeah that sounds like a logical thing to do. Good luck.

I wish whispermemos was available for Android. I have to look into what's available and what the privacy implications are. Would love to use something like that. A Joplin plugin would be awesome.

djbusby · on Oct 20, 2022

On Android one can just use the Microphone input from the keyboard. Just tap and instant transcript, in whatever you like (I use a web-based knowledge thing).

Tap Icon on home screen, tap new, tap Microphone, talk for a few minutes, press save.

Why am app for a built in feature?

RandomWorker · on Oct 20, 2022

It going to your email is quite useful for those that have that email be the primary method of collecting work and notes. It's like that old blackberry concept of the universal inbox, everything just goes to one thing, and get's collected there.

For those with that work flow this is really cool app.

layer8 · on Oct 20, 2022

On iOS you can define a two-step Shortcut that does this (type or dictate text and then send it by email to a predefined address [0]). I imagine there are similar facilities on Android.

[0] “Ask for Input” followed by “Send Email”, deactivate “Show Compose Sheet” on the latter. Then use the microphone symbol for speech recognition on the initial text input.

happyllama · on Oct 20, 2022

This seems great! I’m excited to try it out. It reminds me of Voiceliner which I’ve been using for a while now. Voiceliner does on device transcription & allows for creating hierarchy to notes / ideas with export functions too

https://news.ycombinator.com/item?id=29726787

giantg2 · on Oct 20, 2022

Damn, I wish I knew about Vosk (which Voiceliner uses) a few months ago. I was struggling with Sphinx and couldn't get it to have any accuracy. I will definitely check this out.

ivan_ah · on Oct 21, 2022

+1 for Voiceliner, which has been very useful for me too (heard about it via HN). It works pretty well on iPhone, haven't tested on Android.

topicseed · on Oct 20, 2022

https://recorder.google.com/ is pretty amazing to me and is accessible from mobile and web. Not sure if it's Pixel-specific or not.

notRobot · on Oct 20, 2022

It is pixel exclusive.

> Recordings backed up from the Recorder app on your Google Pixel will appear here.

Visited on a non-pixel Android device.

topicseed · on Oct 21, 2022

Oh, well, it's actually an awesome feature/app.

xenonite · on Oct 20, 2022

Let me mention Just Press Record, which performs the transcription on iOS and macOS with Apple’s speech recognition, which is on-device for several languages.

https://apps.apple.com/app/id1033342465

petercooper · on Oct 20, 2022

Seconded - one big win with Just Press Record is you can add it as a complication on an Apple Watch, so you can hit a button anywhere you are, leave a voice memo, and it's transcribed and on all your iCloud devices.

pwinnski · on Oct 20, 2022

I'm trying to resist a "No wireless. Less space than a Nomad. Lame." response, but iOS already does voice-to-text, so it's hard to see what value-add this offers. I think the comment which inspired you was from an Android user? Or at least not an iOS16 user, where the voice-dictation is much improved.

Single-click launch from the lock screen? That's neat, I don't think I can get there with bare iOS.

No real-time feedback as I speak, so that seems like a point to iOS.

The email is a nice touch, but the trade-off is that you have a copy of all of my recordings now.

Finally, it could be that OpenAI Whisper does a much better job than iOS. I have no idea!

Void_ · on Oct 20, 2022

Haha, you are totally right. I was mainly trying to have fun with a new technology.

The email thing makes a lot of difference for me - I don't wanna fiddle with my phone any more than just pressing one button.

I'll do the rest on my computer. (Copying to Reflect.app, creating a todo, etc.)

jcbages · on Oct 20, 2022

It looks so cool! I'd love to have the transcript send to places other than email. I'm thinking of either a chat app like WhatsApp and Telegram or a productivity app like Notion to keep a more organized journal.

sriram_malhar · on Oct 20, 2022

Very nice indeed. Thanks for sharing.

You mention that you use whisper on the cloud. I was wondering what you think of on-device decoding using the base.en model and tflite.

Something along the lines of:

https://colab.research.google.com/github/usefulsensors/opena...

mtVessel · on Oct 20, 2022

It'd be nice if you linked to a privacy policy on that page. The policy linked to on the app store page isn't very specific, either.

simonebrunozzi · on Oct 27, 2022

Good stuff, but not for me. I need a MacOS app that uses my M1 (soon M2) and does this thing without any extra cost.

mellosouls · on Oct 20, 2022

This looks cool but I think we need a tech equivalent of "...IN MICE" warning for Apple-only products on this site.

btw there are cross-platform apps which have some similar functionality, eg voiceliner

https://a9.io/voiceliner/

rmateu · on Oct 22, 2022

Subscribed to monthly while I figure out if it fits on my workflow.

Anecdotally, it provides higher accuracy in a more relaxed style of narration than Just Press Record or native dictation.

Either way, thanks for sharing. Hoping the side app is working out.

markozivanovic · on Oct 20, 2022

Great job! I love that you only need to press the record and stop on the phone and the rest is done automagically in the background.

I wonder if there's a thing for Android that's as accurate as Whisper!?

Void_ · on Oct 20, 2022

Somebody could build an Android frontend - and use the same OpenAI Whisper library on the server.

I didn't bother with React Native thinking audio recording would be pain in the butt.

mighty_donkey · on Oct 20, 2022

Is there a limit on how much I can record? Is this just for quick notes or could i record a 30mins meeting? Thanks and congrats on the launch.

Void_ · on Oct 20, 2022

Currently 10m limit, but I’ll be testing more.

swah · on Oct 20, 2022

I do something similar with a Telegram bot... but the quality is only good enough to let me know if I should listen to that audio again.

sattoshi · on Oct 20, 2022

What are the “in-app purchases”?

Void_ · on Oct 20, 2022

There’s a subscription to use it beyond 20 free memos. Running these models is pretty expensive!

j45 · on Oct 20, 2022

I am aware of voice memo apps running a library like vosk transcribing just fine on device instead of in cloud and without cost.

Void_ · on Oct 20, 2022

Send a link, would love to try it.

j45 · on Oct 21, 2022

https://github.com/alphacep/vosk-api

Your app is neat in that it can record from the Lock Screen. I was curious to try out the new open ai model.

Too often, iOS has a problem of too many clicks to do the most basic of things.

mritchie712 · on Oct 20, 2022

perfect. signed up.

nice job!