The assumption here is that future iterations of the technology require "BS"-rid...

The assumption here is that future iterations of the technology require "BS"-riddled datasets. I question that assumption, both because the technology probably can improve solely with existing datasets and because we don't know that "synthetic" data isn't able to improve things.