Hacker News new | past | comments | ask | show | jobs | submit login

They could also be digging only into audio, doing speech recognition on it, then clustering the text. Augment that with the text users have put into the video directly using the in-app editor and you have some pretty solid data.



If that were true, it'd be interesting to see if they push out support for close-captioning. It's an accessibility push, but also would leverage a lot of the same capabilities...


I would also start doing image recognition in the video frames, to extract things like gender, objects, etc.


Would this have any advantage over just using video embeddings (or a sequence of frame embeddings?) which in theory should capture those things in vectorized form.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: