I can post a URL to the output once it's finished running, if it'd be of any use to anyone. Oh, and be warned, there's a strong chance that it's buggy. It's certainly not optimised (no threads).
I'd love a list of IDs - I'm doing a research project that is a search engine for lectures (https://www.findlectures.com) and I'm interested to see if there is any overlap.
It seems like it'd be interesting to explore their tagging compared to what is in video transcripts.
I can post a URL to the output once it's finished running, if it'd be of any use to anyone. Oh, and be warned, there's a strong chance that it's buggy. It's certainly not optimised (no threads).
EDIT: The script has now run. I've scraped ~10,000,000 Video IDs, but only ~5.5m of these IDs are unique, so there's probably a bug in my script somewhere (but I need sleep now). Files containing IDs for various categories are listed here: https://redfern.me/public/yt8m/, some notes are here: https://redfern.me/public/yt8m/README.md, and .tar.gz'd archive is available here: https://redfern.me/public/yt8m/yt8m-ids-probably-incomplete.....