More restrictive licensing wouldn’t be enough. This stuff is sufficiently transf...

ad404b8a372f2b9 · on June 23, 2022

I don't think fair use comes into play for this matter. It's the use of the dataset for training that's being objected to, not its reproduction during inference.

6gvONxR4sf7o · on June 23, 2022

But then what about training on hacker news comments, for example? Without permission from ycombinator or the posters, there’s even less permission than any curated dataset could provide or revoke.