What’s really surprising to me is that synching subtitles isn’t a solved problem...

thaumasiotes · on Jan 20, 2022

Because synching subtitles to the video isn't even a goal.

Most people read more slowly than they can listen. Many other people can read faster than they can listen. But you need subtitles that stick around long enough for the first group, and really, you want a comfortable margin so that people aren't frantically trying to finish reading every subtitle before it disappears.

I watched the show 大江大河 ("Like a Flowing River") on Viki, which has excellent community-generated subtitling. The second season isn't available on Viki - only YouTube. (Actually, Viki seems to have lost the license to the first season by now, too.) And the subtitles are abominably bad, bad enough to make me stop watching the show. But they're perfectly synched - every subtitle on Youtube is exactly matched, millisecond-to-millisecond, with a Chinese subtitle which it attempts to translate.[1]

[1] Ignoring the timing, which is much too fast for the English subtitles, the fact that each subtitle is translated independently is another huge problem. It leads to nonsense when one sentence is split across multiple subtitles, because the English and Chinese do not naturally present the same information in the same order.

endisneigh · on Jan 20, 2022

A system that could do what I’m describing would also solve the problem you mention tho

cyphar · on Jan 21, 2022

With tools like alass[1] (using it to synchronise against the original language subtitles) it is about as close to solved you can get.

All of the attempts I've seen of using audio information to synchronise subtitles have been awful. One issue is that some languages subtitle everything, even screams and incoherent shouts (such as Japanese) while others only subtitle dialogue and often rework dialogue for the purposes of making the subtitles short enough to be readable easily. It feels like you need too much domain knowledge to know how different languages subtitle things and that subtitles that match the general meaning of what is being said should be matched up.

[1]: https://github.com/kaegi/alass

echelon · on Jan 19, 2022

Have you turned on Google Meet subtitles recently? They're so good it's uncanny.

mateo1 · on Jan 20, 2022

What's the market for this? Very small.

distances · on Jan 20, 2022

Every movie, every show ever produced? Subtitles are required even for domestic markets for the hearing impaired, even if we disregard the audience that prefers to have subtitles.

pessimizer · on Jan 20, 2022

Literally everyone who subtitles anything.

sp332 · on Jan 19, 2022

I think you can do this with YouTube.