Videogrep: Automatic Supercuts with Python

Osmium · on June 19, 2014

Very cool. What's the current best tool to help make the .srt file? e.g. the current best-of-breed text recognition / text alignment tool. Last time I looked at something like this there didn't seem to be a particularly robust solution, especially for text alignment.

saaaam · on June 19, 2014

Hello! Original author here. I'm not sure about the best tool for creating new .srt files, but for films, you can find pretty much anything on opensubtitles.org or subscene.com (although the quality varies). You can also download .srts for youtube videos (again, quality varies).

isomorphic · on June 20, 2014

I will caution people that downloaded subs are very often misaligned (timing) with your video, due to various cuts of videos, intro logos, disc versions, framerates, etc., etc. Leave your download page open and check one-by-one against the source video.

hyborg787 · on June 19, 2014

For television, addic7ed.com

Spittie · on June 19, 2014

I think Aegisub (http://www.aegisub.org/) is the current golden standard in the anime community, which is one of the most active when it comes to subtitles.

Daiz · on June 21, 2014

Aegisub is indeed the subtitle editor, and the one I would recommend over everything else when it comes to making subtitles.

jbaiter · on June 19, 2014

Last time I did this (~2011} "Subtitle Editor"[1] was a pretty decent solution for *nix-based systems. On Windows, VisualSubSync[2] was all the rage, iirc.

[1] http://home.gna.org/subtitleeditor/ [2] http://www.visualsubsync.org/

isomorphic · on June 20, 2014

On the Mac, Subler can generate .srt files by OCR from both DVD and Blu-ray image subtitle tracks:

https://code.google.com/p/subler/

...although its OCR engine (Tesseract) can be a little sketchy.

ceronman · on June 20, 2014

Gnome subtitles is pretty easy to use and works well: http://gnome-subtitles.sourceforge.net/

nsxwolf · on June 19, 2014

I wanted to do exactly this, but I wanted to do it at a word level granularity. So that you could input any text and get a video of random clips from many sources each saying exactly the words in the input string.

I don't think the metadata is quite there yet.

vitovito · on June 19, 2014

Word (and phoneme) level granularity is usually used for lip-synching (CG, video games) and karaoke-type applications.

If you have an accurate text transcript, but not detailed enough timings, you can use speech recognition on the audio and it will be very accurate, since you know exactly what is being said (unlike speech recognition on arbitrary speech, this is more like command-and-control). You can do word-level or phoneme-level timing granularity pairing an accurate transcript with the original audio.

The metadata isn't there in regular subtitles, but you can certainly get it there with some post-processing.

arafalov · on June 20, 2014

Is there a good easy-to-use software/service that does that kind of alignment of accurate text transcript to audio/video?

I used to have some links but the companies went out of business.

vitovito · on June 21, 2014

Probably not for free. I also haven't done this kind of work in over a decade. The current links I have are:

Annosoft's SDK: http://www.annosoft.com/prices

Annosoft made a command-line front-end to the Microsoft Speech API, which is what many of these other Windows-based systems may also use, and I used in a project in 1999-2001: http://www.annosoft.com/sapi_lipsync/docs/ (There are other SAPI front-ends if you dig around online, too.)

Others, including open-source ones: http://en.wikipedia.org/wiki/List_of_speech_recognition_soft...

Magpie, used in animation and gaming: http://www.thirdwishsoftware.com/magpiepro.html

Crazytalk, used in animation, uses SAPI: http://www.reallusion.com/crazytalk/crazytalk.aspx

FaceFX, used in gaming: http://www.facefx.com/documentation/2013.2/W194

Source Filmmaker includes it, although I'd be surprised if it wasn't Sphinx or SAPI or some other existing library: https://developer.valvesoftware.com/wiki/SFM/Lip-sync_animat...

gerbal · on June 19, 2014

You could get a gigantic bunch of film & television subtitles. Almost every movie ever made and every TV show in the past 30 years has pretty good subtitles available from a variety of online sources.

Using common Python NLP techniques you could very easily search for every instance of a phrase across a massive corpora of subtitles.

If you got a large enough collection of subtitles and videos in a single directory this tool would do what you are asking.

blago · on June 20, 2014

I've had the same idea on the back burner for some time. My primary concern was - and still is - copyright.

crisnoble · on June 20, 2014

Surely a one word clip would fall under fair use.

wtracy · on June 20, 2014

How about a program that automatically finds, say, the smallest number of segments that covers all of the input words? At least then the program would do a lot of the heavy lifting for you, and you could do the last bit of cleanup manually.

sunsu · on June 20, 2014

You should check out the Op3nVoice API: http://www.op3nvoice.com/

Lets you search video/audio for spoken words.

jevinskie · on June 20, 2014

Something similar to this (but automated, of course)?

http://video.bobdylan.com/desktop.html

swatkat · on June 20, 2014

Here's an example of "true" video search: http://www.baarzo.com/

tdicola · on June 19, 2014

Nice tool, also I had no idea about the moviepy library used by the tool. Looks like a really nice little library for making small video edits in python. Cool!

jpdlla · on June 19, 2014

Really impressed with the example of instances of specific grammatical structures. Really great application of something useful with this script.

chatmasta · on June 20, 2014

I remember reading a while back that employees at big news networks (think Fox, CNBC, CNN, etc.) had access to some massive database of broadcast videos, and tooling built around the database to do exactly this. I can't find the source at the moment, but if anyone knows it, a link could be relevant.

ZeroGravitas · on June 20, 2014

This is also what "Google Video" originally did, before it became more like Youtube/Netflix.

cambo · on June 20, 2014

I'm assuming you meant this article at ars http://arstechnica.com/gadgets/2013/09/with-30-tuners-and-30...

manish_gill · on June 20, 2014

One thing that annoys me with subtitles is that when they even have all the sound effects. [SCREAMS LOUDLY], [OMINOUS MUSIC PLAYS] etc. So something like the Total Recall silence thing probably won't work to a great degree of accuracy in those cases.

rmc · on June 20, 2014

Those are for people who are deaf.

Some places call this "closed captioning" (i.e. deaf target audience), versus "subtitles" (target audience is people who can hear, but not understand the language).

rjtavares · on June 20, 2014

Solution: use foreign language subtitles (since the goal is translation and not helping with hearing impaired people, those effects aren't usually included).

derpplease · on June 19, 2014

http://www.youtube.com/watch?v=Wpd2VaFt5iY

please somebody make an automatic rap impersonation generator

can make use of karaoke youtube clips for the background music...

captaincrowbar · on June 20, 2014

I can't find, anywhere in the documentation or a quick skim of the source code, any clue as to which version of Python this requires.

gdw2 · on June 20, 2014

Based on the style of print statements (no parens), I'd say python2.

LeicaLatte · on June 20, 2014

Fascinating!

nobody_nowhere · on June 19, 2014

very cool!

finnn · on June 20, 2014

Absolutely irrelevant correction: Jay Carney is the current, not former, press secretary.

https://en.wikipedia.org/wiki/Jay_Carney

sanityinc · on June 20, 2014

Well, yes, until tomorrow.

finnn · on June 21, 2014

Ah, didn't realize that. Everything is clear now

notastartup · on June 20, 2014

The video produced was extremely entertaining and insightful. I can imagine this tool being very useful for big data analysis.