Hacker News new | past | comments | ask | show | jobs | submit login

Lots of reasons this isn't universally true - it only works if you know enough about the data to simulate it, and your stuck within some distribution + human guesses space that's not all encompassing.

The easiest counterexample is training LLMs, how are you going to synthesize useful language examples if you want more. Some version of this is true for most applications.




Yeah the issue is you can generate data, but it won’t be good data. Training over random strings won’t make you learn language, but it’s technically data.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: