More restrictive licensing wouldn’t be enough. This stuff is sufficiently transformative to count as fair use without any permission at all from the data owner. New laws will be required for stuff like this.
I don't think fair use comes into play for this matter. It's the use of the dataset for training that's being objected to, not its reproduction during inference.
But then what about training on hacker news comments, for example? Without permission from ycombinator or the posters, there’s even less permission than any curated dataset could provide or revoke.