You're confusing it with data poisoning. Model collapse itself is(was?) a fairly...

astrange · on June 28, 2024

Oh, no, they definitely believe both are going to happen and ChatGPT is just going to stop working because it'll see itself on the internet. It goes with the common belief that LLMs learn from what you type into them.

> To a first order, that means you better have a pre-2022 dataset to get started, and have archived it well.

I think that will always be available, or at least, a dataset with the distribution you want will be available.

groby_b · on June 28, 2024

Don't know why you have such a disdain for artists, but either way, the original point was that model collapse wasn't "a coping idea made up by artists", but a valid research backed scientific model.

>I think that [clean pre-2022 data set] will always be available

Good luck obtaining one.