Hacker News new | past | comments | ask | show | jobs | submit login

If, for some reason, you wanted a list of all of the video IDs (I couldn't easily find such a list), then I wrote a crappy scraper to pull them out: https://gist.github.com/JosephRedfern/d60bdc584d84b1451cc605....

I can post a URL to the output once it's finished running, if it'd be of any use to anyone. Oh, and be warned, there's a strong chance that it's buggy. It's certainly not optimised (no threads).

EDIT: The script has now run. I've scraped ~10,000,000 Video IDs, but only ~5.5m of these IDs are unique, so there's probably a bug in my script somewhere (but I need sleep now). Files containing IDs for various categories are listed here: https://redfern.me/public/yt8m/, some notes are here: https://redfern.me/public/yt8m/README.md, and .tar.gz'd archive is available here: https://redfern.me/public/yt8m/yt8m-ids-probably-incomplete.....




I'd love a list of IDs - I'm doing a research project that is a search engine for lectures (https://www.findlectures.com) and I'm interested to see if there is any overlap.

It seems like it'd be interesting to explore their tagging compared to what is in video transcripts.


I've updated my original comment with some URLs.


Awesome, thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: