Show HN: DeepSpeech based automated transcription service

bob_theslob646 · on June 29, 2018

How does this compare to Google's speech API?

braindead_in · on June 29, 2018

We are planning to do a benchmark with Google Web Speech once it adds support for Multi Speaker files. We once tried with our internal test set on Google Web Speech when we started building this and the WER came to be around 18%.

pell · on June 29, 2018

I would recommend also testing with IBM's Watson Speech. In my usage it was a lot more accurate than Google and Azure. I also did a couple of tests with AWS and Watson was always ahead. All these tests were with American and British English.

_oya8 · on June 29, 2018

In general we're pretty happy with Trint for American & British accents for our stuff (though not to say we won't take a look at what you've got :)). They usually require a bit of tweaking, but it's pretty good. The killer feature for us would be training against people with other accents. You'll notice our transcripts really constitute a pretty big part of what we do, so a good quality transcription service for people with different accents would be an awesome thing.

e.g. clearly once this course leaves early access we'll want to get this copy-edited, and Yan here is British, so even here Trint's not always great :) https://livevideo.manning.com/module/38_1_1/production-ready...?

braindead_in · on June 29, 2018

We do have provide an option to get the transcripts corrected manually by our transcribers. Would that work for you?

gok · on June 29, 2018

By "95% accuracy on the LibriSpeech" you mean a word error rate of 5% on that test set, using your own training data?

braindead_in · on June 29, 2018

The WER on LibriSpeech clean test set is around 0.087 and the CER is 0.030. We trained on around a 5000 hours dataset which included LibriSpeech train.

gok · on June 29, 2018

So where did 95% come from?

braindead_in · on June 29, 2018

I think I made a mistake there. It should have been 92% accuracy. Can't find the edit button now.

gok · on June 29, 2018

Have you considered using a more traditional HMM style recognizer? The stock Kaldi chain model recipe should get you more like 5 or 4.5% WER on LibriSpeech.

braindead_in · on July 2, 2018

We actually started our experiments with Kaldi and even built a dataset out of our files to train on. But we found that Kaldi required a lot of data-prep and a long lead time. Our internal dataset is quite large and data prep is quite easy compared to Kaldi.

chadmeister · on June 29, 2018

You mean you meant to type 91.3% but accidently typed 95%? Were you using your own transaction software?

ryan-allen · on June 29, 2018

This is pretty great, I'm going to show it to our UX person who works with transcripts from user testing. The editor feature is pretty great for cleaning up transcripts, and I think it'd be faster to do that than to manually do it as we have been.

The automated process is pretty funny when working with Australian english though!

> Upset is gonna record you forget it out, so... alright, so I just wanted to start saying, What is your family technology? And I can send a man and I can use Excel and Word, such the internet use as post things of baseball pesticides that you... absolutely, absolutely, but a part of not a great deal.

It's pretty good despite the chaotic stop start of conversations between two or three people.

braindead_in · on June 29, 2018

Yeah, the predictions are bad around the speaker turns right now. We are working on a better turns model and that should fix this issue.

goesprotocall · on June 29, 2018

You said its free but I got a $5.90 charge!

braindead_in · on June 29, 2018

Try choosing Auto transcribe from the dropdown menu next to the Order Transcript button

titanix2 · on June 29, 2018

Consider saying explicitly which languages are supported by your service because it not written anywhere in the blog post.

braindead_in · on June 29, 2018

Done. Thanks!

kvz · on June 29, 2018

How does this compare to trint.com?

braindead_in · on June 29, 2018

We are pretty much on par with Trint. They use the API from Speechmatics and by our benchmark, we are better than Speechmatics on conversational audio. We will be posting a proper benchmark numbers soon. We are building a podcast dataset for testing right now.

bwill94070 · on June 29, 2018

Have you checked out http://otter.ai? They do real-time transcription in the browser and on iOS and Android.

Arn_Thor · on June 29, 2018

I'm always looking to speed up the transcription process. I'll be sure to give this a try! The editor is especially useful.

edent · on June 29, 2018

Seems to be tuned to American English - coped poorly with my British accent.

I can't see a way to delete uploaded files. Am I missing something?

braindead_in · on June 29, 2018

The dropdown menu next to the Edit Transcript button has the delete option. It works best for North American files with clean audio. We will eventually get it as good for British, Australian and all other accents as well. We build a new model every month almost, based on the corrections our transcribers make.

CommanderData · on June 29, 2018

What about real time audio transcribing? Will you add support for this

braindead_in · on June 29, 2018

Real time transcription is not something we are looking at right now. Our goal with this is to see if we can assist our transcribers and improve the efficiency of our system. So our focus is offline transcription for now.