I first encountered steno'd captions in person at Google's TGIF and it was aweso...

jamesbrownuhh · on June 1, 2014

If you have a good transcript, you can temporarily upload the video to YouTube (mark it as private if you like, it only needs to be accessible to you) - then in the "Captions and Subtitles" options, upload the plain text transcript.

YouTube will sync the transcript to the audio, (removing the ambiguity of it having to guess what is being said, as you're telling it that - so now it knows what words it's listening out for) and you can download the resulting automatically timed file as an SBV or SRT type file.

It's not 100% perfect, and is something of a hack, but it usually works pretty well. :)

PointerReaper · on June 1, 2014

http://amara.readthedocs.org/en/latest/index.html (Amara: Create Captions and Subtitles) might be of interest to you.

jamesbrownuhh · on June 3, 2014

Thank you! Those docs foxed me completely but I did take a look at Amara's main site - an interesting approach. Not a magic bullet but I love the principle of opening up tasks like these to the crowd.

zaroth · on June 1, 2014

Can anyone offer pointers to efficient algorithms for achieving this? Seems like the naive way would be to run voice recognition on the video and try to line up matches could give you anchor points, and then you would just extrapolate between those points to build an index from word->timestamp?

FranOntanaya · on June 1, 2014

We work with large volumes of subtitles here and that's basically what we are aiming to do. There's a couple commercial solutions that still do a poor job for a hefty price tag, if that's the only thing you need from them. Not profitable compared with human sync unless you have to sync more than a couple hundred hours of content.

jamesbrownuhh · on June 3, 2014

The BBC developed an "Assisted Subtitling" system which on paper is pretty fancy - accepts scripts in most common formats, automatically determines optimum colours for each speaker, processes video to find shot changes, uses voice recognition to spot the dialogue at the right points, and turns out an almost-complete subtitle file that just needs a look over by a human to ensure it's sensible.

Better still, it's now open source - although sadly the voice recognition part of it relies on a closed source commercial product which (at first glance) might not even still be available.

Dangerous in the wrong hands, but interesting: http://subtitling.sourceforge.net/