Hacker News new | past | comments | ask | show | jobs | submit login

> There’s not 9 other wikipedias

By the way, I wonder how much you could get from "history" data: wikipedia history pages, talk pages, commits diffs on github, pull request discussions, etc.

AFAIK so far we've only been using the finished code "artifacts", but if we're desperate for more tokens to train on, we might get a lot of mileage from just "all different versions of this dataset over time".




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: