It's not that straightforward. I would say that the most important part of pre-processing is how to break the transcript into parts.
But there are many more things to improve. It's a pipeline: you can add more models, you can train them, you can change prompts, you can post-process the results. But it takes time. More time each step.
I have tried many (all the popular) huggingface summarization models, they don't compare to GPT. Basically, they highlight the parts they think are important. But they can't "understand" and generalize.
Frankly, I haven't read the papers about summarization. But I will have to when I'll work on reducing costs.
I see, thank you for the insight. The OpenAI paper about summarization is an easy read and very similar to your approach, if you are looking for somewhere to start: https://openai.com/blog/summarizing-books/.
It happens quite often with TorToiSe that it collapses in this way. Especially for unseen tokens that wouldn't have appeared in the training data, which likely consisted of a lot of transcribed stuff and read text like audio books. Trying to make it laugh by prompting it with "hahaha" (which you won't really see in mentioned data) often gets you demon and zombie noises.
It uses the TorToiSe TTS model for generation. It's simple to generate conditioning voice latents using short audio samples. Likely transcribed JRE episodes were part of the TorToiSe training data, explaining how it's so good at recreating his voice characteristics in particular.
That generation uses tortoise-tts. Play.ht has a model called peregrine, I've taken to using a script to call them out. Super cool company & API. I just haven't had time to get my next gen version out.
Interesting application! It hadn't occurred to me. To my shame, the best I've come up with is to cut ads from influencers videos (make a rewind button).
I'll chime in with hearty agreement. Being able to let my son watch what he finds without my having to watch it all first (as I do now)? I hope this product/service is available in about 4-5 years, when he's at an age where his interests will hopefully start diverging from ours.
There are probably loads of other parents who would love this.
After installing the extension, go to youtube. There is a "Sign in with Google" button where "related videos" are usually located. If something went wrong, please send a screenshot to me at support@eightify.app
When I first got to know it, there wasn't much good resources. And now there are 100 times more articles, but I haven't watched them. So I can't recommend anything yet.