Hacker News new | past | comments | ask | show | jobs | submit login

It depends entirely on what the data set is, and to conclude that it's "wrong" you'd have to consider the underlying data too. Google ngrams makes no claim to be a consistent benchmark type data set. Over time the content its based on shifts, which can cause effects like this.

To make any sort of claim like "this word's usage changes over time" in an academic sense you'd need to include a discussion of the data sources you used and why those are representative of word usage over time. The fact that they'd even try to use google ngrams in this way shows how little they actually researched the topic.

Google ngrams is a cute data set that can sometimes show rough trends, but it's not some "authoritative source on usage over time" and it doesn't claim to be.

The authors, on the other hand, are claiming to be authoritative and thus the burden of evidence on their claims is far far far higher. I didn't even get into their completely unobjective and vague accusations of "AI" somehow doing something bad. Ngrams don't involve AI, it's simple word counting.




> The authors, on the other hand, are claiming to be authoritative and thus the burden of evidence on their claims is far far far higher.

From what I read the authors are only claiming that some Google n-grams fail the common sense test and that the data shouldn't be considered rigorous.

"said" is in the top 300 most frequent English words, according to Wiktionary. For its usage to halve in 80 years then double again in 20 would represent a profound shift in English that would certainly be known to linguists.

Or, as with "toast", one could simply doubt the veracity of the data.


The way I read it, the article was a rant about how people shouldn't be using ngrams to prove things.


According to this page (https://books.google.com/ngrams/info), if you want to write a paper based on their results (why would you do this against a cute dataset?) make sure to quote their very authoritative sounding paper "Quantitative Analysis of Culture Using Millions of Digitized Books"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: