Hacker News new | past | comments | ask | show | jobs | submit login

Not sure the internet and publicly available data really reflect the "is" properly - I suspect the data to be biased in complex way.



It's easy to forget most people have no significant online presence, if any, if you spend too much time with very online people.


It’s also important to remember that most of what is online is there to sell ads and does not represent reality in the same quantity. I think people are really trying too hard to find deep meaning everywhere, they might want to read more about social sciences instead.


That's true.

That said, what can be found online does cover a lot of what offline people do and think and write, since there is a lot of stuff being brought online that wasn't produced online (books, news, ...)

Otoh, it's not clear how (or if) LLM training balances the different sources


Even someone who has an online presence, I’ll use myself as an example, is so much more than the online presence represents.


Absolutely. Many sibling comments point out all the ways this is true - that this massive sample (and it is truly big-data) is still a tiny speck of the human condition and spectrum of thought and experience.

It raises vital questions about the where the centroid of this cloud of random stuff that people decided to input into the machine really lies? What is not represented in the model? Probably just about everything! Big new questions about objectivity and normalcy occur.

Is the average of everybody elses' intelligence actually any use at all to an average individual? Does the average of everybody elses' intelligence have a different kind of use to groups, companies, states, than common utility of synthesising though-like speech?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: