[EDIT: Available at the following URL with proper hyperlinks, per a suggestion in the comments: http://michaelnielsen.org/blog/open-access-a-short-summary/]
The topic of open access to scientific papers comes up often on Hacker News.
Unfortunately, those discussions sometimes bog down in misinformation and misunderstandings.
Although it's not exactly my area of expertise, it's close --- I've spent the last three years working on open science.
So I thought it might be useful to post a summary of the current state of open access. There's a lot going on, so even though this essay appears lengthy, it's actually a very brief and incomplete summary of what's happening. I have links to further reading at the end.
This is not a small stakes game. The big scientific publishers are phenomenally profitable. In 2009, Elsevier made a profit of 1.1 billion dollars on revenue of 3.2 billion dollars. That's a margin (and business model) they are very strongly motivated to protect. They're the biggest commercial journal publisher, but the other big publishers are also extremely profitable.
Even not-for-profit societies often make an enormous profit on their journals. In 2004 (the most recent year for which I have figures) the American Chemical Society made a profit of 40 million dollars on revnues of 340 million dollars. Not bad! This money is reinvested in other society activities, including salaries. Top execs receive salaries in the 500k to 1m range (as of 2006, I'm sure it's quite a bit higher now: http://www.chemistry-blog.com/2008/01/02/acs-executive-compensations-for-2006/)
The traditional publishers make money by charging journal subscription fees to libraries. Why they make so much money is a matter for much discussion, but I will merely point out one fact: there are big systematic inefficiencies built into the market. University libraries for the most part pay the subscription fees, but they rely on guidance (and often respond to pressure) from faculty members in deciding what
journals to subscribe to. In practice, faculty often have a lot of power in making these decisions, without bearing the costs. And so they can be quite price-insensitive.
The journal publishers have wildly varying (and changing) responses to the notion of open access.
For example, most Springer journals are closed access, but in 2008 Springer bought BioMedCentral, one of the original open access publishers, and by some counts the world's largest. They continue to operate. (More on the deal here: http://www.earlham.edu/~peters/fos/2008/10/springer-buys-biomed-central.html)
[Edit: It has been pointed out to me in email that Springer now uses a hybrid open access model for most of their journals, whereby authors can opt to pay a fee to make their articles open access. If the authors don't pay that fee, the articles remain closed. The other Springer journals, including BioMedCentral, are fully open access.]
Nature Publishing Group is also mostly closed access, but has recently started an open access journal called Scientific Reports, apparently modelled after the (open access) Public Library of Science's journal PLoS One.
It is sometimes stated that big commercial publishers don't allow authors to put free-to-access copies of their papers on the web. In fact, policies vary quite a bit from publisher to publisher. Elsevier
and Springer, for example, do allow authors to put copies of their papers on their websites, and into institutional repositories. This doesn't mean that always (or even often) happens, but it's at least in principle possible.
Comments on HN sometimes assume that open access is somehow a new issue, or an issue that no-one has been doing anything about until recently.
This is far from the case. Take a look at the Open Access Newsletters at http://www.earlham.edu/~peters/fos/newsletter/archive.htm and you'll realize that there's a community of people working very, very hard for open access. They're just not necessarily working in ways that are visible to hackers.
Nonetheless, as a result of the efforts of people in the open access movement, a lot of successes have been achieved, and there is a great deal of momentum toward open access.
Here's a few examples of success:
In 2008 the US National Institutes of Health (NIH) --- by far the world's largest funding agency, with a $30+ billion dollar a year budget --- adopted a policy requiring that all NIH-funded research be made openly accessible within 12 months of publication. See, e.g.: http://www.earlham.edu/~peters/fos/nihfaq.htm
All 7 UK Research Councils have adopted similar open access policies requiring researchers they fund to make their work openly accessible.
Many universities have adopted open access policies. Examples include:
Harvard's Faculty of Arts and Sciences: see http://www.earlham.edu/~peters/fos/2008/02/more-on-imminent-oa-mandate-at-harvard.html
MIT: http://www.earlham.edu/~peters/fos/2009/03/mit-adopts-university-wide-oa-mandate.html
Princeton: http://www.dailyprincetonian.com/2011/09/29/28869/
As a result of policies like these, in years to come you should see more and more freely downloadable papers showing up in search results.
Note that there are a lot of differences of detail in the different policies, and those details can make a big difference to the practical impact of the policies. I won't try to summarize all the nuances here, I'm merely pointing out that there is a lot of institutional movement.
Many more pointers to open access policies may be found at http://roarmap.eprints.org/. That site notes 52 open access policies from grant agencies, and 135 from academic institutions.
There's obviously still a long way to go before there is universal open access to publicly-funded research, but there has been a lot of progress, and a lot of momentum.
One thing that I hope will happen is that the US Federal Research Public Access Act passes. First proposed in 2006 (and again in 2010), this Act would essentially extend the NIH policy to all US Government-funded research (from agencies with budgets over 100 million). My understanding is that at present the Act is tied up in committee.
Despite (or because of) this progress, there is considerable pushback on the open access movement from some scientific publishers. As just one instance, in 2007 some large publishers hired a very aggressive PR firm to wage a campaign to publicly discredit open access: http://www.scientificamerican.com/article.cfm?id=open-access-to-science-un
I will not be surprised if this pushback escalates.
What can hackers do to help out?
One great thing to do is start a startup in this space. Startups like Mendeley, ChemSpider, BioMedCentral, PLoS and others have had a big impact over the past ten or so years, but there's even bigger opportunities for hackers to really redefine scientific publishing. Ideas like text mining, recommender systems, open access to data, automated inference, and many others can be pushed much, much further.
I've written about this in the following essay: http://michaelnielsen.org/blog/is-scientific-publishing-about-to-be-disrupted/. Many of those ideas are developed in much greater depth in my book on open science (http://michaelnielsen.org/blog/reinventing-discovery/).
For less technical (and less time-consuming!) ways of getting involved, you may want to subscribe to the RSS feed at: http://www.taxpayeraccess.org/action/index.shtml. This organization (the Alliance for Taxpayer Access) was crucial in lobbying for the NIH open access policy, and they're involved in lobbying for the Federal Public Research Access Act, as well as other open access efforts.
If you want to know more, the best single resource I know is Peter Suber's website: http://www.earlham.edu/~peters/hometoc.htm.
Suber has, for example, written an extremely informative introduction to open access (http://www.earlham.edu/~peters/fos/overview.htm). His still-active Open Access Newsletter
(http://www.earlham.edu/~peters/fos/newsletter/archive.htm) is a goldmine of information, as is his (no longer active) blog (http://www.earlham.edu/~peters/fos/fosblog.html). He also runs the open access tracking project: http://twitter.com/#!/OATP.
If you got this far, thanks for reading! Corrections are welcome.
A few examples:
http://www.sigir.org/forum/F2001/sigirFall01Letters.html
http://www.math.columbia.edu/~woit/wordpress/?p=442
http://www.math.columbia.edu/~woit/wordpress/?p=581
JMLR, at least, has gone on to successfully eclipse the journal, Machine Learning, that it was intended to replace (about 3x the impact factor).
I do notice that the examples I find are all in computer science and mathematics, and the new journals have basically zero budgets (and like it that way) and don't charge authors any fees. Is this because in CS/math a common expectation is that the author can use LaTeX, produce their own figures, and submit a print-ready PDF, whereas in other fields authors expect significant formatting work to be done by the journal?