Claude Shannon's original 1948 paper "A Mathematical Theory of Communication" launched the entire field of information theory. It's 50 pages, highly readable, and pedagogical. The source of its magic is that Shannon introduces and concretely grounds an essentially new ontological concept of vast applicability. And it has 100,000 citations.
This could have easily been 3-4 landmark papers, but instead its packed into one cogent idea.
That common interview question about query autocomplete/sentence completion? Shannon solved it and demonstrates it in this paper, almost a decade before FORTRAN existed. New grads still struggle with that problem. PhD's still struggle with that problem.
Pretty much every machine learning classifier is using a loss function described in that paper.
I always thought it'd be a really cool start-up idea to have a service that prints, binds, etc.. and mails you within a few days a printed paper like this.
I probably have 10 papers floating around, loose pages. Annoying. I print them when I want to read them but of course rarely can immediately.
http://math.harvard.edu/~ctm/home/text/others/shannon/entrop...