Very cool. What's the current best tool to help make the .srt file? e.g. the current best-of-breed text recognition / text alignment tool. Last time I looked at something like this there didn't seem to be a particularly robust solution, especially for text alignment.
Hello! Original author here. I'm not sure about the best tool for creating new .srt files, but for films, you can find pretty much anything on opensubtitles.org or subscene.com (although the quality varies). You can also download .srts for youtube videos (again, quality varies).
I will caution people that downloaded subs are very often misaligned (timing) with your video, due to various cuts of videos, intro logos, disc versions, framerates, etc., etc. Leave your download page open and check one-by-one against the source video.
I think Aegisub (http://www.aegisub.org/) is the current golden standard in the anime community, which is one of the most active when it comes to subtitles.
Last time I did this (~2011} "Subtitle Editor"[1] was a pretty decent solution for *nix-based systems.
On Windows, VisualSubSync[2] was all the rage, iirc.
I wanted to do exactly this, but I wanted to do it at a word level granularity. So that you could input any text and get a video of random clips from many sources each saying exactly the words in the input string.
Word (and phoneme) level granularity is usually used for lip-synching (CG, video games) and karaoke-type applications.
If you have an accurate text transcript, but not detailed enough timings, you can use speech recognition on the audio and it will be very accurate, since you know exactly what is being said (unlike speech recognition on arbitrary speech, this is more like command-and-control). You can do word-level or phoneme-level timing granularity pairing an accurate transcript with the original audio.
The metadata isn't there in regular subtitles, but you can certainly get it there with some post-processing.
Annosoft made a command-line front-end to the Microsoft Speech API, which is what many of these other Windows-based systems may also use, and I used in a project in 1999-2001: http://www.annosoft.com/sapi_lipsync/docs/ (There are other SAPI front-ends if you dig around online, too.)
You could get a gigantic bunch of film & television subtitles. Almost every movie ever made and every TV show in the past 30 years has pretty good subtitles available from a variety of online sources.
Using common Python NLP techniques you could very easily search for every instance of a phrase across a massive corpora of subtitles.
If you got a large enough collection of subtitles and videos in a single directory this tool would do what you are asking.
How about a program that automatically finds, say, the smallest number of segments that covers all of the input words? At least then the program would do a lot of the heavy lifting for you, and you could do the last bit of cleanup manually.
Nice tool, also I had no idea about the moviepy library used by the tool. Looks like a really nice little library for making small video edits in python. Cool!
I remember reading a while back that employees at big news networks (think Fox, CNBC, CNN, etc.) had access to some massive database of broadcast videos, and tooling built around the database to do exactly this. I can't find the source at the moment, but if anyone knows it, a link could be relevant.
One thing that annoys me with subtitles is that when they even have all the sound effects. [SCREAMS LOUDLY], [OMINOUS MUSIC PLAYS] etc. So something like the Total Recall silence thing probably won't work to a great degree of accuracy in those cases.
Some places call this "closed captioning" (i.e. deaf target audience), versus "subtitles" (target audience is people who can hear, but not understand the language).
Solution: use foreign language subtitles (since the goal is translation and not helping with hearing impaired people, those effects aren't usually included).