As an NLP researcher, I see lots of NLP papers on top of AAAI. I think that the high citation count is due to
(i) people in NLP actually bothering to tell others why their approach is interesting
(ii) other people being interested in the same / a similar kind of thing [avoiding the discipline-of-one problem that niche AI applications would have] and
(iii) NLP having a reasonably developed "canon" about what counts as must-cite papers. This canon is heavily biased towards US work, and towards people who write decent explanations of what they do, but at least it makes sure that people know about the big problems and failed (or not-quite-failed-as-badly-as-the-others) solutions.
What you see in other conferences is that the "Best paper" awards get to (i) more theoretical papers which still have issues to solve before people can use the approach (nothing wrong with those!), in (ii) subfields that are currently "hot". Whereas the most-cited papers are (i) more obviously about things that a dedicated person could apply in practice, and (ii) in a subfield that is obscure at the time but will become more popular in the following years.
Reviewing SIGMOD, it appears that a lot of the citations earned are less about innovative research, and more about the everyone using the software tools they published.
And a survey paper in the field of big data analysis (survey papers are citation bate, but won't be pulling in many grants or awards).
Once a paper gets big enough you basically have to cite it any time you touch on vaguely related just to prove that you are aware of it, almost as a shibboleth. My wife is an academic and on more than one occasion has she gotten comments back from peer review along the lines of "good article, but you why haven't you cited famous papers X,Y and Z".
I wonder if it's more subtle than that - if your paper has associated source code, then it's likely that people reading it might try it out, and the ideas that you've presented will stick around more than a short paper with no follow up material.
In other words, papers are only a short glimpse into your research, presenting code allows an interested reader to look deeper, and means that they're more likely to remember what you've done, and cite it later.
Is it possible that papers that get the awards help give the scientists new ways of looking at problems, while the papers that are frequently cited are more likely to follow established viewpoints and back it with hard data I can use to justify later experiments?
What I mean is, if a paper makes me think "wow, I've never though of this that way before, I wonder if I could try something like that with this...." I probably wouldn't cite it, right? Its not directly related. But I would probably give it an award for best paper because it helped me come up with a new approach to my own problem.
That's an interesting idea. For what it's worth, I would cite the publication which prompted the idea behind my approach, so I don't think this is the answer to your question.
That being said, I think you're onto something. I was recently involved in a project leading to a paper for a conference, and while I don't know whether it'll get accepted (meaning I don't know if it's even representative of the kinds of papers conferences want to see), I notice in hindsight that we had only a few citations that really informed a totally novel methodological approach we used. By contrast, we cited a ton of research in the domain area we studied.
So my takeaway is that it's hard to bring in dozens of papers to inform one's methods necessarily because methods (like a protocol) require one somewhat coherent narrative. Integrating the collective body of knowledge about a topic (like, oh let's say.. online labor markets) is a lot easier, making "rapid-fire" citations in that context more likely.
I'd like to see a seasoned researcher (or at least someone with the experience of a few publications) weigh in on this though.
Another issue is that "best paper" competitions tend to attract younger scientists who don't have the same "star power" to generate citations.
Star power in science is nothing to scoff at either. If Einstein was still about and writing papers you can be sure everyone and their grandmother would cite them, even if the papers were useless. Now, while that is an extreme example, many fields have their own "stars" that are credited with making significant gains in a particular area.
This is certainly the case.
On the other hand, "best papers" are not necessarily really the most important papers. Often politics, e.g., proportional representation of all important sub-fields among the best paper nominees, decide about whether a paper gets named "best" or not.
Anecdotally, I know of awesome papers that were not selected as "best paper" and 'cute idea' papers without substance that got a "best paper" award.
Nitpicking, but why are they claiming to provide MAP (mean average precision) scores when their description and equation indicates that they are computing average precision, not MAP. According to the definition of MAP [1] that they link to, MAP is computed across multiple queries while average precision is computed for one [2]. Furthermore, they truncate their calculation to only consider the top 3 cited papers (i.e., they don't go all the way to 100% recall), so it's not even really the average precision.
Conference organizers are well aware that best paper awards are not perfect predictors of importance or popularity. Many top conferences specifically introduced separate awards ("most influential", "test of time", etc.) granted e.g. 10 years after publication.
Often times, the truly revolutionary ideas (and most likely to be cited in future papers) are those that are not understood fully or perhaps even reviled, and thus likely to not be awarded any significant honor on introduction.
Einstien won his Nobel for work on the photoelectric effect, not relativity, though obviously the latter mattered more.
> Einstien won his Nobel for work on the photoelectric effect, not relativity, though obviously the latter mattered more.
The photoelectric effect led to quantum mechanics and the associated science and technology. I don't think it's obvious that relativity mattered more. If you judge by day-to-day use, one could plausibly argue that our understanding of quantum mechanics has a bigger impact.
My high school science teacher told us that he was given the prize for the photoelectric effect because that was a well understood mechanism by the time of the awarding. It was generally understood that relativity was going to be a bigger deal (concept wise), but since it was still so hotly debated / not yet thoroughly accepted, the commission wanted to ensure that he was given a prize before death (the award is not given post-humorously for some reason) and given for something that wouldn't be later rejected.
"In 1905 Albert Einstein published a paper that explained experimental data from the photoelectric effect as being the result of light energy being carried in discrete quantized packets. This discovery led to the quantum revolution."
I don't think that they even expect them to be predictors of importance or popularity. The "best paper" award can give a hint of temporary fashions or about what seemed to be interesting concepts/works at a certain point in time. That's why it seems especially interesting to see them compared with the "top cited" in these charts.
I think it's far more interesting to just see what the top cited papers are every year (after the fact) than to compare with the best paper. Best paper awards are given for a lot of reasons that aren't consistent across conferences or even across years of the same conference.
A cursory browse shows an interesting pattern in the names of the researchers.
edit_ Perhaps I should clarify: It's entirely possible that exposure in the West has a large part to do with the media, who often don't wade too deeply into scientific matters.
(i) people in NLP actually bothering to tell others why their approach is interesting
(ii) other people being interested in the same / a similar kind of thing [avoiding the discipline-of-one problem that niche AI applications would have] and
(iii) NLP having a reasonably developed "canon" about what counts as must-cite papers. This canon is heavily biased towards US work, and towards people who write decent explanations of what they do, but at least it makes sure that people know about the big problems and failed (or not-quite-failed-as-badly-as-the-others) solutions.
What you see in other conferences is that the "Best paper" awards get to (i) more theoretical papers which still have issues to solve before people can use the approach (nothing wrong with those!), in (ii) subfields that are currently "hot". Whereas the most-cited papers are (i) more obviously about things that a dedicated person could apply in practice, and (ii) in a subfield that is obscure at the time but will become more popular in the following years.