Hacker News new | past | comments | ask | show | jobs | submit login

I first encountered steno'd captions in person at Google's TGIF and it was awesome even for the non-hearing-impaired. You get a few seconds of scrollback in case you miss a word, which is easy to do at large gatherings.

The only thing that seems to be missing from the !!Con transcripts are timestamps. I'm not familiar with Plover so it might already be a feature, but being able to output one of the standard subtitle formats (SubRip, timed text, SSA) would make remuxing an MP4/MKV of each talk with subtitles much easier.




If you have a good transcript, you can temporarily upload the video to YouTube (mark it as private if you like, it only needs to be accessible to you) - then in the "Captions and Subtitles" options, upload the plain text transcript.

YouTube will sync the transcript to the audio, (removing the ambiguity of it having to guess what is being said, as you're telling it that - so now it knows what words it's listening out for) and you can download the resulting automatically timed file as an SBV or SRT type file.

It's not 100% perfect, and is something of a hack, but it usually works pretty well. :)


http://amara.readthedocs.org/en/latest/index.html (Amara: Create Captions and Subtitles) might be of interest to you.


Thank you! Those docs foxed me completely but I did take a look at Amara's main site - an interesting approach. Not a magic bullet but I love the principle of opening up tasks like these to the crowd.


Can anyone offer pointers to efficient algorithms for achieving this? Seems like the naive way would be to run voice recognition on the video and try to line up matches could give you anchor points, and then you would just extrapolate between those points to build an index from word->timestamp?


We work with large volumes of subtitles here and that's basically what we are aiming to do. There's a couple commercial solutions that still do a poor job for a hefty price tag, if that's the only thing you need from them. Not profitable compared with human sync unless you have to sync more than a couple hundred hours of content.


The BBC developed an "Assisted Subtitling" system which on paper is pretty fancy - accepts scripts in most common formats, automatically determines optimum colours for each speaker, processes video to find shot changes, uses voice recognition to spot the dialogue at the right points, and turns out an almost-complete subtitle file that just needs a look over by a human to ensure it's sensible.

Better still, it's now open source - although sadly the voice recognition part of it relies on a closed source commercial product which (at first glance) might not even still be available.

Dangerous in the wrong hands, but interesting: http://subtitling.sourceforge.net/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: