Hacker News new | past | comments | ask | show | jobs | submit login

“Four decades of the internet (accelerated by COVID) has given us trillions of tokens’ worth of training data.”

What’s up with the “accelerated by COVID”? It feels completely out of place. Would we have not had enough training data if COVID didn’t happen? Blessings in disguise, I guess.




I’m mildly curious if those additional tokens will make AI better, or worse.

We arguably trained AI on the good stuff first. Novels. Wikipedia entries. GitHub open-source projects with a lot of stars. What’s left but mediocrity and our “baser” internet ramblings?

Some researchers already found out that AI-sourced content can affect the models, but what about content from increasingly out-of-touch people?


The only way I can interpret that is assuming it is referring to the massive uptick in video conferencing which COVID caused. Although I'm not sure what data programs like Zoom actually collected and if it was shared with generative AI creators.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: