It's trained on Twitter data so I assume Reddit data as well.
Honestly feels like they're both pretty important datasets to ingest if trying to build a model on human speech, I reckon social medias, comment sections and co have the most natural human conversational text online.
Honestly feels like they're both pretty important datasets to ingest if trying to build a model on human speech, I reckon social medias, comment sections and co have the most natural human conversational text online.