> For quality control, I looked only at comments with Reddit score > 100
That's a non-trivial popularity score. Also, since it's an absolute score, it will bias against smaller subreddits, where 100 points on any comment is a difficult task.
This is much less "how people talk on reddit", and much more "the type of comment that gets upvotes on the default subreddits"
Yikes, that sounds like a great way to bias your data away from controversial opinions about weed. That would be like taking an exit poll of only people wearing lots of political apparel.
> For quality control, I looked only at comments with Reddit score > 100
That's a non-trivial popularity score. Also, since it's an absolute score, it will bias against smaller subreddits, where 100 points on any comment is a difficult task.
This is much less "how people talk on reddit", and much more "the type of comment that gets upvotes on the default subreddits"