Not sure that synthetic or LLM-generated training data is as useful as human gen...

brianr · on April 27, 2023

Agree about synthetic data. My point is that AI-powered applications that are deployed in production generate more _real_ data which can be used for training. For example, self-driving cars generate tons of data about how their models perform, as a result of the cars driving around. Similarly, code-writing AI applications will generate feedback in the form of errors, logs, etc. which is can be fed back into the models as training data.