Hey, I'm the blog post author (Deepgram cofounder too)! Sorry if the accuracy wa...

detaro · on Oct 10, 2016

Is it intentional that your website has no pricing information or specific details what one would get after signing up for an API key? Some blogposts reference pricing, but no idea if those are current and what the APIs look like. Not very inviting to play with stuff :/

garysieling · on Oct 10, 2016

I've been using Deepgram for building https://www.findlectures.com - I don't know about the marketing site but there is pricing within the app.

mdrzn · on Oct 10, 2016

If you sign up it's 5 hours for free then 5$ for 6.7 hours. I figured it out just now.

detaro · on Oct 12, 2016

So their old blogposts are inaccurate...

kastnerkyle · on Oct 10, 2016

There are a number of papers tackling a similar task [0][1][2] for anyone who is interested. There isn't enough information to tell exactly what is going on with Deepgram, but one way to approach this would be to construct a shared embedding space for words/phrases and speech. These types of embedding spaces are powerful [3][4][5], but not magic.

Cool demo, looking forward to seeing more detail about what is going on. However I would quibble with the STT WER quoted above. Maybe in noisy environments with unknown speakers (and no voice normalization) this is accurate, but the kinds of clean speech in the demo perform really well in modern recognition engines (on benchmark data, to be fair c.f. MSR 6.3% and IBM at ~6.9%).

Most word searches over speech to text work over soft matches (or ideally beam search over most likely partial phoneme/word part matches), rather than hard matches so it seems like a bit of a straw man comparison in this case.

[0] http://research.google.com/pubs/pub42543.html

[1] https://arxiv.org/abs/1510.01032

[2] https://sigport.org/sites/default/files/gloveNNLM_kaudhkhasi...

[3] https://arxiv.org/abs/1502.03044

[4] http://www-personal.umich.edu/~reedscot/files/icml2016.pdf

[5] https://arxiv.org/abs/1411.2539