The author mentions in the second paragraph that the data comes from scraping Wikipedia articles' plot descriptions. So the plots might be old, but the descriptions (and language) were all written recently.
Before drawing any strong conclusions, I'd probably want to do at least some validation against original sources, e.g. Project Gutenberg. You're talking about plot descriptions written by a mostly fairly narrow demographic. I'd be hesitant to use that to draw conclusions about the source material.
It might have a large impact on the language, though. 'Empoison' used to be a verb, 'burgle' has largely been replaced with 'rob', and so on. I think this would tend to improve the data, though - 'empoison' and 'poison' ought to be grouped.