Hacker News new | past | comments | ask | show | jobs | submit login

> Ngram says toast almost vanishes from the English language by 1980, and then it pops back up.

The Ngram plot does not say that. It shows usage dropping ~40% (since 1800). It’s indeed a problem that the graph Y axis doesn’t go to zero, as others have pointed out. But did the etymonline authors really not notice this before declaring incorrectly what it says? I would find that hard to believe (especially considering the subsequent “see, no dip” example that has a zero Y and a small but visible plateau around 1980), and it’s ironic considering the hyperbolic and accusatory title and and opening sentence.




The graph axis isn't the only problem. The word "toast" did not drop in usage by 40%, Google's dataset shifted dramatically towards a different genre than it was composed of previously. I've been in conversations with people trying to explain those drops in the 70s, and no one (myself included) realized that it was such a dramatic flaw in the data.


That’s fair, the article has a very valid point, which would be made even stronger without the misreading of the plots they’re critiquing, whether it was accidental or intentional. I always thought Ngrams were weird too, I remember in the past thinking some of the dramatic shifts it shows were unlikely.


Is there no way to filter out particular data sets? This seems like a pretty huge limitation.


Sort of, but it's pretty blunt. You can select between a few different English corpuses, but it's basically fiction versus everything, not more fine than that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: