> Because so far we've never seen an AI improve based on its own output. Maybe i...

haldujai · on April 23, 2023

This really only works well in resource limited settings and/or semisupervised tasks.

I've tried augmentation for LLM domain adaptation and it's very modest gains in the best of situations, and even still the augmented corpus is a very tiny fraction of the underlying training corpus.

I believe OP's question was getting at whether synthetic data is useful as a substantial corpus for unsupervised training of a language model (given the topic it's reasonable to disregard other areas of 'AI') and that answer appears to be no or at least unproven and non-intuitive.

istjohn · on April 23, 2023

Boosting is reminiscent of the wisdom of the crowd effect.