Hacker News new | past | comments | ask | show | jobs | submit login

I wonder how much more a model would learn about subtitles from including audio AND video in training. Sure, the costs would be way bigger (parsing video even deterministically is 1.5 orders of magnitude worse than audio) but it might help with the edge cases where the speech is so unclear even the subtitle scene can't agree.



[flagged]


this sounds like chatgpt drivel


nhh, Google Bard




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: