Show HN: SpeechBoard – Edit Podcasts from the Transcript

craigcannon · on Nov 10, 2017

Hey HN!

Craig from YC here. Ramon Recuero and I built SpeechBoard.

Here's how it works: you record a podcast and upload it to SpeechBoard. We run it through a few speech to text APIs to generate a transcription for you. From there you can delete words from the transcript and we cut them from the audio.

Then you can download three files: your edited audio, your original file with cuts marked in metadata for importing to Audition/Audacity, and labels for importing into Audacity.

Nailing the in/out points of words was the hardest part, which led us to create the Audition/Audacity import feature and now I think that's the best part :)

Let us know what you think!

AndrewUnmuted · on Nov 11, 2017

This is great! You've nailed one of the major pain-points in creating and producing high quality podcasts/audiobooks.

The other major pain-point that I've identified after spending 10 years in the podcast and audiobook industries is audio mastering. As a mastering engineer, I know how troublesome it can be to master audio so that it is at a competitive dynamic range.

Would you be interested in implementing an "auto-master" feature that I've developed? If so, please reach out! My email is in my HN profile.

jv22222 · on Nov 11, 2017

I think you can pass the output audio file through https://auphonic.com/ and that will get the job done for you.

OzzyB · on Nov 10, 2017

Really nice + fun.

One suggestion is to be able to add extra silence (whitenoise?) so we can retain a natural flow after cutting aggressively.

Maybe recognize an ellipses ... and add a second or two?

craigcannon · on Nov 10, 2017

Smart! Keywords could be really handy. [Add Intro Music] and that sort of thing.

erikig · on Nov 10, 2017

This is pretty great, I will see if we can use it for our conference calls as well. Could you share some of the technical details behind the project?

craigcannon · on Nov 10, 2017

Nice! Yeah, we've run into a few interesting use cases already e.g. sermons, which I didn't even know were recorded.

So it's written in Python and the main API we rely on is IBM's Speech to Text.

One thing that became an issue was deleting half of a word or inserting characters. Now we handle that with JavaScript. In the future we'd like to get some Lyrebird in there :)

If you have specific questions just lmk.

toomuchtodo · on Nov 10, 2017

Impressed that you’re going to incorporate Lyrebird!

craigcannon · on Nov 10, 2017

Just trying to automate myself out of a job :)

woodson · on Nov 10, 2017

Do you have a privacy policy?

Are you storing the audio and/or edited transcripts?

craigcannon · on Nov 10, 2017

We're not storing anything.

But we don't have a formal policy. We'll work on that.

woodson · on Nov 10, 2017

Thanks! It's likely obvious to users with a technical background, but others might not realize that the STT APIs used by this service could retain uploaded audio data.

michaelmior · on Nov 10, 2017

Very cool! I did find that if I remove "three" there's a noticeable jump in the resulting audio. But it's great that I can take the output and further refine it.

craigcannon · on Nov 10, 2017

Thanks! Yeah, the word in/out points still aren't perfect so the imports really help there.

synthecypher · on Nov 10, 2017

Hey Craig, how do you transcribed the audio in your app? and presumably, it only supports English (atm)?

craigcannon · on Nov 11, 2017

Hey!

We do it with a few speech to text APIs.

We've only tested with English so far but it should be able to handle a bunch: Arabic, English, Spanish, French, Portuguese, Japanese, and Mandarin.

synthecypher · on Nov 13, 2017

So not just one provider?

bhargav · on Nov 10, 2017

Is encoding time the only thing from preventing this from extending to video?

craigcannon · on Nov 10, 2017

Encoding time + file size for uploads.

It's definitely something we're interested in though.

Check out ScriptSync too - http://www.avid.com/products/media-composer-scriptsync-optio...

I also talked about this project in depth on Spencer Wright's podcast - https://theprepared.org/podcast-feed/2017/10/15/craig-cannon...

Dowwie · on Nov 11, 2017

Listen to Sam Harris's latest podcast. Before he starts the show, he basically explains the problem you are solving. I think he would be interested!

craigcannon · on Nov 11, 2017

Will check it out!

dacohenii · on Nov 10, 2017

This reminds me of a recent Radiolab episode [0] in which they discuss a new technology that would allow you not only to remove words from audio, but also to add new ones in the speaker's voice -- and likeness, in the case of video. They made an example video [1], which was not super-convincing, but a very good demonstration of what it may be able to do in the future.

Of course, the focal point of the episode isn't the technology itself, but the implication it could have on society once it gets good enough that its output is virtually indistinguishable from real video (i.e. fake news in the form of convincing-looking videos).

Highly recommended if you have the time.

[0] https://www.radiolab.org/story/breaking-news/ [1] https://www.futureoffakenews.com/

rrecuero · on Nov 10, 2017

Take a look at Lyrebird as well https://lyrebird.ai/ ;)

binaryjason · on Nov 10, 2017

Very interesting application. I am not sure if you guys have looked into this, but there is a Python library that can detect timestamps on word level if given the audio and transcript. It's pretty accurate for English: https://github.com/readbeyond/aeneas

craigcannon · on Nov 10, 2017

Thanks! Yeah, we have used it and can confirm it's pretty good :)

levistoddard · on Nov 10, 2017

I've been working on a similar problem, and excited to share as well.

Sample:

https://reader.listensynced.com/ycombinator-jessica-livingst...

As a podcast junkie - I've often run into issues with searching and sharing. Linking transcript and audio is first step to solving this...

craigcannon · on Nov 10, 2017

Neat! I know that guy :)

graham1776 · on Nov 10, 2017

There is a podcast I listen to every time it comes out (Uhh Yeah Dude), but the creators don't create transcripts. One idea I had was directly targeting podcast creators and creating a service whereby you create searcheable transcripts for listeners looking for an old podcast.

I think the transcript creation service in itself must be worth something for these guys.

froindt · on Nov 10, 2017

I would be quite interested in that, especially if it could integrate with my podcast app.

Since August of 2016 I've listened to 30 days worth of podcasts and saved another 22 days worth of time by skipping introductions, listening faster than 1x, etc. The most annoying thing I'm facing is that I've heard hundreds of stories and the audio cannot be indexed easily. If I want to send a friend to a specific episode for a certain story, I have no good way to remember if it was the Freakonomics podcast, This American Life, Story Collider, Planet Money, or one of the 20+ other podcasts I listen to.

I'd love a system which would make available a searchable transcript of every podcast. I couldn't pay for transcribing all of them, but I'd pay 50 cents/podcast. Google tells me 1.75/minute will get a transcription from the top listed service, so if we had 210 people like me, we could transcribe an hour of audio.

craigcannon · on Nov 11, 2017

Roger that. Thanks!

thenomad · on Nov 11, 2017

I've said this a few times but happy to say it again: I will very cheerfully pay for a good podcast -> transcript service.

I speed-read. I don't speed-listen. And my lifestyle has zero podcast-compatible travel time. So there's a whole world of great content in a very user-unfriendly format for me currently.

craigcannon · on Nov 11, 2017

Thanks for the input!

craigcannon · on Nov 10, 2017

Yeah, I think there are totally people out there who'd pay for it.

YC does :)

For the YC podcast we chose to host with Backtracks because they offer transcription and allow you to link back to exact times in the episode, sort of like YouTube.

rectangletangle · on Nov 10, 2017

I like how the core functionality/UI is immediately accessible above the fold. No log in, or other barriers to simply trying it out.

Great job!

craigcannon · on Nov 10, 2017

Thanks! :)

sturmen · on Nov 10, 2017

This is excellent. Looking forward to the full release with the hope that pricing is affordable for hobbyists. :)

craigcannon · on Nov 10, 2017

Thanks! Can you email us?

human@speechboard.co

We're looking to chat with hobbyists to see what you'd like out of it.

mistercow · on Nov 10, 2017

This is really cool. There's something magical about just editing text, and then having real spoken audio change to reflect it.

I hit an error when I trimmed the text down to:

> Hey this is a different original.

> The text cuts into

Maybe I was too aggressive?

Some undo support would also be really helpful. Have you considered just having a free-form text field, then using something like wdiff to produce the edits? That might make the UX easier, since you wouldn't have to manually reinvent the text editing tools people expect (although you'd have to handle invalid edits, like people adding new words).

craigcannon · on Nov 10, 2017

Thanks!

Yeah, without looking at your logs I suspect you cut too much. :)

We were using a free-form text field before and it led to a couple issues: cutting words in half + inputs. Both of those basically break it now so we went for a slightly less convenient but mostly functional demo.

I totally agree though, this needs a lot of polish on the UX side.

onuralp · on Nov 10, 2017

Hi Craig,

This looks very interesting. You might be the right person to ask about something related that I am currently working on: do you know of any app that would extract keyword / name based parts of audio? For example, extract only the parts where Elon Musk speaks given audio input (podcast, YouTube etc.)? Alternatively, extract only the parts (-30 and +30 seconds) when a specific word is mentioned.

Thanks!

craigcannon · on Nov 10, 2017

Hey!

Audiogrep may be able to do that for ya - https://github.com/antiboredom/audiogrep

frik · on Nov 10, 2017

> This looks very interesting. You might be the right person to ask about something related

Hi Craig, do you know an app/code that can split the audio/transcript based on persons? Detect different persons in a podcast and group the transcript by person. Thanks!

craigcannon · on Nov 10, 2017

Hey!

That's something we're also interested in.

You can read up on the subject and see a few projects here - https://en.wikipedia.org/wiki/Speaker_diarisation

https://dsp.stackexchange.com/questions/3119/library-to-diff...

But to answer your question, I have yet to try an app that can do it well.

raja · on Nov 10, 2017

Speechmatics diarisation is pretty good. https://www.speechmatics.com/

craigcannon · on Nov 10, 2017

Cool. Will check it out.

frik · on Nov 10, 2017

Thanks for the hints!

onuralp · on Nov 10, 2017

Awesome, I'll check it out - thanks!

vermontdevil · on Nov 10, 2017

My question is the availability of transcripts for deaf and others to utilize? Is this possible as another feature of your service?

craigcannon · on Nov 10, 2017

What other features would you need to make it work well for you?

vermontdevil · on Nov 10, 2017

Addendum - see this blog post to see where I’m coming from

https://sixcolors.com/post/2017/03/the-dream-of-converting-p...

vermontdevil · on Nov 10, 2017

Hi. Not for me. Just thinking out loud if the transcripts are automated by your service and it’ll be a way for the podcasters to provide along with their audio recording.

craigcannon · on Nov 10, 2017

Ah. Gotcha.

dogruck · on Nov 11, 2017

I would like an (automated) “IMDB for Podcasts.” Specifically, I would like to be able to find, and be notified of, every podcast where a given person speaks.

Similarly, I’d like automated data on what ads are run/read.

Essentially, I’d like rich automated metadata, in addition to timestamped transcripts.

craigcannon · on Nov 11, 2017

Yes! I feel that way too. So far I've found Breaker has the best guest search. Definitely lots to work on in the podcast space :)

jtbayly · on Nov 10, 2017

Won't let me upload a file in Safari, so I tried Chrome. Upload works, but then I just get "An error has occured. We are looking into it."

Like the demo. Wish I could try it out on some other audio.

craigcannon · on Nov 10, 2017

Try incognito in Chrome. We're trying to sort out that bug. Thanks for your patience!

jtbayly · on Nov 10, 2017

No dice, but I signed up for your mailing list. Will look forward to the final product.

orliesaurus · on Nov 10, 2017

Somewhat relevant (but more enterprise-y) project based in Austin, TX for anyone local: http://clarify.io/

patwalls · on Nov 10, 2017

Hey! This is awesome.

Side question: I just need (good) transcription of audio. I've never been able to find a good service for the price.

Does anyone have any recommendations?

GarethX · on Nov 10, 2017

I’ve used rev.com for a while. They’re quick and accurate.

patwalls · on Nov 10, 2017

$1 a minute is just way too expensive for what I need. That sounds like a human is doing it...

Any services doing this automated?

thenomad · on Nov 11, 2017

I just discovered https://trint.com .

No idea if they're any good, but they're certainly cheaper than Rev.

geetfun · on Nov 10, 2017

Really magical to see audio editable like this. Love it.

craigcannon · on Nov 10, 2017

Thanks!

orliesaurus · on Nov 10, 2017

what languages are you supporting out of the box?

craigcannon · on Nov 10, 2017

We've only tested with English so far but it should be able to handle a bunch: Arabic, English, Spanish, French, Portuguese, Japanese, and Mandarin.

If you test another language out definitely let us know how it performs.

phirschybar · on Nov 10, 2017

cool idea

craigcannon · on Nov 10, 2017

Thanks!

hn_hates_tor3 · on Nov 10, 2017

Is this a YC project or a personal project? Are you applying to YC with this to get funding by any chance? Nice one by the way!