Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: SpeechBoard – Edit Podcasts from the Transcript (speechboard.co)
190 points by craigcannon on Nov 10, 2017 | hide | past | favorite | 69 comments



Hey HN!

Craig from YC here. Ramon Recuero and I built SpeechBoard.

Here's how it works: you record a podcast and upload it to SpeechBoard. We run it through a few speech to text APIs to generate a transcription for you. From there you can delete words from the transcript and we cut them from the audio.

Then you can download three files: your edited audio, your original file with cuts marked in metadata for importing to Audition/Audacity, and labels for importing into Audacity.

Nailing the in/out points of words was the hardest part, which led us to create the Audition/Audacity import feature and now I think that's the best part :)

Let us know what you think!


This is great! You've nailed one of the major pain-points in creating and producing high quality podcasts/audiobooks.

The other major pain-point that I've identified after spending 10 years in the podcast and audiobook industries is audio mastering. As a mastering engineer, I know how troublesome it can be to master audio so that it is at a competitive dynamic range.

Would you be interested in implementing an "auto-master" feature that I've developed? If so, please reach out! My email is in my HN profile.


I think you can pass the output audio file through https://auphonic.com/ and that will get the job done for you.


Really nice + fun.

One suggestion is to be able to add extra silence (whitenoise?) so we can retain a natural flow after cutting aggressively.

Maybe recognize an ellipses ... and add a second or two?


Smart! Keywords could be really handy. [Add Intro Music] and that sort of thing.


This is pretty great, I will see if we can use it for our conference calls as well. Could you share some of the technical details behind the project?


Nice! Yeah, we've run into a few interesting use cases already e.g. sermons, which I didn't even know were recorded.

So it's written in Python and the main API we rely on is IBM's Speech to Text.

One thing that became an issue was deleting half of a word or inserting characters. Now we handle that with JavaScript. In the future we'd like to get some Lyrebird in there :)

If you have specific questions just lmk.


Impressed that you’re going to incorporate Lyrebird!


Just trying to automate myself out of a job :)


Do you have a privacy policy?

Are you storing the audio and/or edited transcripts?


We're not storing anything.

But we don't have a formal policy. We'll work on that.


Thanks! It's likely obvious to users with a technical background, but others might not realize that the STT APIs used by this service could retain uploaded audio data.


Very cool! I did find that if I remove "three" there's a noticeable jump in the resulting audio. But it's great that I can take the output and further refine it.


Thanks! Yeah, the word in/out points still aren't perfect so the imports really help there.


Hey Craig, how do you transcribed the audio in your app? and presumably, it only supports English (atm)?


Hey!

We do it with a few speech to text APIs.

We've only tested with English so far but it should be able to handle a bunch: Arabic, English, Spanish, French, Portuguese, Japanese, and Mandarin.


So not just one provider?


Is encoding time the only thing from preventing this from extending to video?


Encoding time + file size for uploads.

It's definitely something we're interested in though.

Check out ScriptSync too - http://www.avid.com/products/media-composer-scriptsync-optio...

I also talked about this project in depth on Spencer Wright's podcast - https://theprepared.org/podcast-feed/2017/10/15/craig-cannon...


Listen to Sam Harris's latest podcast. Before he starts the show, he basically explains the problem you are solving. I think he would be interested!


Will check it out!


This reminds me of a recent Radiolab episode [0] in which they discuss a new technology that would allow you not only to remove words from audio, but also to add new ones in the speaker's voice -- and likeness, in the case of video. They made an example video [1], which was not super-convincing, but a very good demonstration of what it may be able to do in the future.

Of course, the focal point of the episode isn't the technology itself, but the implication it could have on society once it gets good enough that its output is virtually indistinguishable from real video (i.e. fake news in the form of convincing-looking videos).

Highly recommended if you have the time.

[0] https://www.radiolab.org/story/breaking-news/ [1] https://www.futureoffakenews.com/


Take a look at Lyrebird as well https://lyrebird.ai/ ;)


Very interesting application. I am not sure if you guys have looked into this, but there is a Python library that can detect timestamps on word level if given the audio and transcript. It's pretty accurate for English: https://github.com/readbeyond/aeneas


Thanks! Yeah, we have used it and can confirm it's pretty good :)


I've been working on a similar problem, and excited to share as well.

Sample:

https://reader.listensynced.com/ycombinator-jessica-livingst...

As a podcast junkie - I've often run into issues with searching and sharing. Linking transcript and audio is first step to solving this...


Neat! I know that guy :)


There is a podcast I listen to every time it comes out (Uhh Yeah Dude), but the creators don't create transcripts. One idea I had was directly targeting podcast creators and creating a service whereby you create searcheable transcripts for listeners looking for an old podcast.

I think the transcript creation service in itself must be worth something for these guys.


I would be quite interested in that, especially if it could integrate with my podcast app.

Since August of 2016 I've listened to 30 days worth of podcasts and saved another 22 days worth of time by skipping introductions, listening faster than 1x, etc. The most annoying thing I'm facing is that I've heard hundreds of stories and the audio cannot be indexed easily. If I want to send a friend to a specific episode for a certain story, I have no good way to remember if it was the Freakonomics podcast, This American Life, Story Collider, Planet Money, or one of the 20+ other podcasts I listen to.

I'd love a system which would make available a searchable transcript of every podcast. I couldn't pay for transcribing all of them, but I'd pay 50 cents/podcast. Google tells me 1.75/minute will get a transcription from the top listed service, so if we had 210 people like me, we could transcribe an hour of audio.


Roger that. Thanks!


I've said this a few times but happy to say it again: I will very cheerfully pay for a good podcast -> transcript service.

I speed-read. I don't speed-listen. And my lifestyle has zero podcast-compatible travel time. So there's a whole world of great content in a very user-unfriendly format for me currently.


Thanks for the input!


Yeah, I think there are totally people out there who'd pay for it.

YC does :)

For the YC podcast we chose to host with Backtracks because they offer transcription and allow you to link back to exact times in the episode, sort of like YouTube.


I like how the core functionality/UI is immediately accessible above the fold. No log in, or other barriers to simply trying it out.

Great job!


Thanks! :)


This is excellent. Looking forward to the full release with the hope that pricing is affordable for hobbyists. :)


Thanks! Can you email us?

human@speechboard.co

We're looking to chat with hobbyists to see what you'd like out of it.


This is really cool. There's something magical about just editing text, and then having real spoken audio change to reflect it.

I hit an error when I trimmed the text down to:

> Hey this is a different original.

> The text cuts into

Maybe I was too aggressive?

Some undo support would also be really helpful. Have you considered just having a free-form text field, then using something like wdiff to produce the edits? That might make the UX easier, since you wouldn't have to manually reinvent the text editing tools people expect (although you'd have to handle invalid edits, like people adding new words).


Thanks!

Yeah, without looking at your logs I suspect you cut too much. :)

We were using a free-form text field before and it led to a couple issues: cutting words in half + inputs. Both of those basically break it now so we went for a slightly less convenient but mostly functional demo.

I totally agree though, this needs a lot of polish on the UX side.


Hi Craig,

This looks very interesting. You might be the right person to ask about something related that I am currently working on: do you know of any app that would extract keyword / name based parts of audio? For example, extract only the parts where Elon Musk speaks given audio input (podcast, YouTube etc.)? Alternatively, extract only the parts (-30 and +30 seconds) when a specific word is mentioned.

Thanks!


Hey!

Audiogrep may be able to do that for ya - https://github.com/antiboredom/audiogrep


> This looks very interesting. You might be the right person to ask about something related

Hi Craig, do you know an app/code that can split the audio/transcript based on persons? Detect different persons in a podcast and group the transcript by person. Thanks!


Hey!

That's something we're also interested in.

You can read up on the subject and see a few projects here - https://en.wikipedia.org/wiki/Speaker_diarisation

https://dsp.stackexchange.com/questions/3119/library-to-diff...

But to answer your question, I have yet to try an app that can do it well.


Speechmatics diarisation is pretty good. https://www.speechmatics.com/


Cool. Will check it out.


Thanks for the hints!


Awesome, I'll check it out - thanks!


My question is the availability of transcripts for deaf and others to utilize? Is this possible as another feature of your service?


What other features would you need to make it work well for you?


Addendum - see this blog post to see where I’m coming from

https://sixcolors.com/post/2017/03/the-dream-of-converting-p...


Hi. Not for me. Just thinking out loud if the transcripts are automated by your service and it’ll be a way for the podcasters to provide along with their audio recording.


Ah. Gotcha.


I would like an (automated) “IMDB for Podcasts.” Specifically, I would like to be able to find, and be notified of, every podcast where a given person speaks.

Similarly, I’d like automated data on what ads are run/read.

Essentially, I’d like rich automated metadata, in addition to timestamped transcripts.


Yes! I feel that way too. So far I've found Breaker has the best guest search. Definitely lots to work on in the podcast space :)


Won't let me upload a file in Safari, so I tried Chrome. Upload works, but then I just get "An error has occured. We are looking into it."

Like the demo. Wish I could try it out on some other audio.


Try incognito in Chrome. We're trying to sort out that bug. Thanks for your patience!


No dice, but I signed up for your mailing list. Will look forward to the final product.


Somewhat relevant (but more enterprise-y) project based in Austin, TX for anyone local: http://clarify.io/


Hey! This is awesome.

Side question: I just need (good) transcription of audio. I've never been able to find a good service for the price.

Does anyone have any recommendations?


I’ve used rev.com for a while. They’re quick and accurate.


$1 a minute is just way too expensive for what I need. That sounds like a human is doing it...

Any services doing this automated?


I just discovered https://trint.com .

No idea if they're any good, but they're certainly cheaper than Rev.


Really magical to see audio editable like this. Love it.


Thanks!


what languages are you supporting out of the box?


We've only tested with English so far but it should be able to handle a bunch: Arabic, English, Spanish, French, Portuguese, Japanese, and Mandarin.

If you test another language out definitely let us know how it performs.


cool idea


Thanks!


Is this a YC project or a personal project? Are you applying to YC with this to get funding by any chance? Nice one by the way!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: