Hacker News new | past | comments | ask | show | jobs | submit login

The author failed to mention the context of the data chosen; specifically the dates. For example, rob vs. steal, would change depending on the century the work was created.



The author mentions in the second paragraph that the data comes from scraping Wikipedia articles' plot descriptions. So the plots might be old, but the descriptions (and language) were all written recently.


Before drawing any strong conclusions, I'd probably want to do at least some validation against original sources, e.g. Project Gutenberg. You're talking about plot descriptions written by a mostly fairly narrow demographic. I'd be hesitant to use that to draw conclusions about the source material.


Hm. Wouldn't that have little bearing on the result? Can't really say "he poisons" when that wasn't the plot of the story.


It might have a large impact on the language, though. 'Empoison' used to be a verb, 'burgle' has largely been replaced with 'rob', and so on. I think this would tend to improve the data, though - 'empoison' and 'poison' ought to be grouped.


But that's the point. "She poisons" is more likely to be the plot of the story than "he poisons."


There could be a bias in the summary writers too, in that they prefer "he murders" and "she poisons" for the same method of killing.


> For example, rob vs. steal, would change depending on the century the work was created.

Eh?

Both come so naturally to me that I honestly have no idea which centuries you're talking about, or which you think is obsolete.


Steal has become relatively more popular over time although the difference is probably not large enough to attach a lot of significance to it.

https://books.google.com/ngrams/graph?content=rob%2C+steal&y...

If I had to guess why (other than reasons...) I'd probably speculate that steal tends to have a broader meaning while rob is more likely to be applied to physically robbing someone.


Interesting, playing around with that shows it's a lot tighter in Britain - where it's also the other way around in the past tense by quite a margin. (Likewise in the USA until recently.)

I'm British, so maybe that explains my initial confusion.


Comparing the past tenses is probably more appropriate as that's the tense that those verbs are most commonly used in. It also occurs to me that a lot of the use of just "rob" may well be the name "Rob." Given that, the usage of stole vs. robbed tracks quite closely with just a bit of an upward track of stole over the past couple of decades.

I originally thought that rob vs. steal might be one of those Germanic vs. French roots thing but both words seem to be traceable through Old English.


> a lot of the use of just "rob" may well be the name "Rob."

Google ngram search is case-sensitive though, so that's unlikely to account for much - and probably offset by sentences beginning with the verb, e.g. 'Rob him, quick!' said the man.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: