It depends entirely on what the data set is, and to conclude that it's "wrong" y...

ramblenode · on Sept 27, 2023

> The authors, on the other hand, are claiming to be authoritative and thus the burden of evidence on their claims is far far far higher.

From what I read the authors are only claiming that some Google n-grams fail the common sense test and that the data shouldn't be considered rigorous.

"said" is in the top 300 most frequent English words, according to Wiktionary. For its usage to halve in 80 years then double again in 20 would represent a profound shift in English that would certainly be known to linguists.

Or, as with "toast", one could simply doubt the veracity of the data.

lolc · on Sept 26, 2023

The way I read it, the article was a rant about how people shouldn't be using ngrams to prove things.

ehaliewicz2 · on Sept 29, 2023

According to this page (https://books.google.com/ngrams/info), if you want to write a paper based on their results (why would you do this against a cute dataset?) make sure to quote their very authoritative sounding paper "Quantitative Analysis of Culture Using Millions of Digitized Books"