Lots of reasons this isn't universally true - it only works if you know enough a... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

version_five on April 23, 2023 | parent | context | favorite | on: Will we run out of ML data? Evidence from projecti...

Lots of reasons this isn't universally true - it only works if you know enough about the data to simulate it, and your stuck within some distribution + human guesses space that's not all encompassing.

The easiest counterexample is training LLMs, how are you going to synthesize useful language examples if you want more. Some version of this is true for most applications.

bertday on April 23, 2023 [–]

Yeah the issue is you can generate data, but it won’t be good data. Training over random strings won’t make you learn language, but it’s technically data.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact