Not sure the internet and publicly available data really reflect the "is" proper...

Kye · on Jan 1, 2024

It's easy to forget most people have no significant online presence, if any, if you spend too much time with very online people.

mlinhares · on Jan 1, 2024

It’s also important to remember that most of what is online is there to sell ads and does not represent reality in the same quantity. I think people are really trying too hard to find deep meaning everywhere, they might want to read more about social sciences instead.

ithkuil · on Jan 1, 2024

That's true.

That said, what can be found online does cover a lot of what offline people do and think and write, since there is a lot of stuff being brought online that wasn't produced online (books, news, ...)

Otoh, it's not clear how (or if) LLM training balances the different sources

bjelkeman-again · on Jan 1, 2024

Even someone who has an online presence, I’ll use myself as an example, is so much more than the online presence represents.

nonrandomstring · on Jan 1, 2024

Absolutely. Many sibling comments point out all the ways this is true - that this massive sample (and it is truly big-data) is still a tiny speck of the human condition and spectrum of thought and experience.

It raises vital questions about the where the centroid of this cloud of random stuff that people decided to input into the machine really lies? What is not represented in the model? Probably just about everything! Big new questions about objectivity and normalcy occur.

Is the average of everybody elses' intelligence actually any use at all to an average individual? Does the average of everybody elses' intelligence have a different kind of use to groups, companies, states, than common utility of synthesising though-like speech?