Hacker News new | past | comments | ask | show | jobs | submit login

Hey all - I made this and am wondering if anyone here has any experience with pocketsphinx and could lend a hand in making the transcriptions more accurate. Let me know! (Or just make a pull request.)



Well you already got pocketsphinx set up so that is a start (I recently did this here: https://github.com/kastnerkyle/ez-phones but it was a little annoying to script). There are a few ways to train/extend pocketsphinx, but ultimately good ASR in arbitrary environments is a research problem!

I second the opinion that Kaldi is more advanced, but it is also way, way more complicated to do anything with a custom dataset. There are a few examples of decoding with existing models though, so maybe that is a start. These lectures may help: http://www.danielpovey.com/kaldi-lectures.html

There is also an interesting toolbox here that you can train, though getting access to TIMIT, WSJ, etc. is pretty annoying. http://www.cs.cmu.edu/~ymiao/kaldipdnn.html

You may also get mileage out of some kind of post-transcription NLP/cleaning if you haven't done that yet.


Have you tried using Kaldi instead? I think Kaldi has far more advanced models.

Existing trained models can be downloaded from here: http://www.kaldi-asr.org/ (via: http://www.openslr.org/12/ http://www.clsp.jhu.edu/~guoguo/papers/icassp2015_librispeec...)


I haven't, but I'll take a look. Thanks!


Can't help w/ pocketsphinx, but do work w/ transcription sync, where accuracy depends on source. Goog/tube ASR is above 90%, at least where well-recorded people speak evenly paced with minimal accent. Otherwise, where vocals are hard to hear, ASR isn't good enough. Human corrected transcripts cost $1/minute today, and will 10x more affordable soon.


Why will human corrected transcripts be 10x cheaper soon? Because of increasing ASR? What about poorly-recorded people?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: