9 Jun 2010

Green is no goal

This is a guest post by Joe Dunckley
To achieve a sufficiently large but distant win, it is worth sacrificing a much smaller but nearer win if it stands in the way or distracts and delays the larger achievement. To achieve a small but near win, it is not worth sacrificing a much larger but more distant win. But the difference in magnitude must be sufficiently large, and the difference in distance sufficiently small, to make delaying the gratification really pay off. Speculation and argument over the sizes and distances and relative probabilities of success and incompatibilities of the competing achievements fuel many a political argument.
Like "green" open access.
Green open access is simple: for every scientific journal paper, at least one of the authors must take action to ensure that the paper is freely available to the world online, somehow. They can deposit the text in PubMed central, or put a crude PDF of a draft version on their website. The increasingly preferred method for many advocates of green OA, though, is the institutional repository: each university library manages its own database of affiliated researchers' papers. This will solve the problem: the inability of people to read a paper that they want to read.
A heresy for you: access is not an interesting problem. The stubborn toll access publishers are correct when they say that most people can read most of the papers that they want to read. Yes, it takes emails to the authors, piracy amongst friends, and borrowed passwords, and yes it is a real problem, and no, the toll access publishers do not have any excuse to do nothing about it. But it's not an interesting problem any more. Letting us read a paper for free, without having to log-in or pester the author, once the paper reaches twelve months old, is not a revolution.
There are other similarly dull problems in science and publishing that green OA doesn't address. Like how to save university libraries from the parasitic subscription access publishers that are slowly killing their helpless hosts. Green OA tells parasitic publishers that they can continue draining libraries of their budgets with subscription bundles to poor quality journals that few people want to read, so long as they open access to the papers after twelve months. Now, as libraries face their greatest budget squeezes of the recession, is the perfect time for them to get some guts, speak up, say 'no', and shake off these parasites once and for all, before somebody comes along and hides them behind a bigger and stickier sticking plaster. Students should be rioting at the news that they are expected to do without textbooks and computers because their library has chosen instead to spend the several tens of thousands of pounds on a package of obscure and substandard journals. Instead, we're distracted by green OA, told that it is the one thing that academia desperately needs.
More interesting than these little problems are the opportunities that are currently presented to us: the real revolutions. Open, structured, reusable data has already demonstrated its revolutionary credentials in the field of genomics. Genome data that can be searched and mined by powerful computers and clever algorithms has enabled cheap and easy high-throughput hypothesis testing, and even hypothesis generation: it has led to countless discoveries that weren't on anybody's mind when they set out to collect the data, because the database as a whole is worth far more than the sum of the individual data gathering experiments. There are vast quantities of data in the literature: from microscopy to biogeography, epidemiological trends to drug toxicity. There are great and important discoveries waiting to be made in that data. But they're not being made, because unlike with genomics, no organisation has made the effort to build the database; no campaign group has achieved a mandate that the data be made open and reusable. Instead, the data, where it is available at all, is locked away in non-standard tables within unstructured PDF files, distributed across largely subscription access journals that reserve all rights to reuse.Gold open-access at least, by making literature mining possible, doesn't stifle these new open data opportunities, even if it's not the full solution; green open-access, by focussing on the need for access to human readable literature, distracts us from these possibilities entirely.
Or open notebook science: a model that, by getting scientists to discuss their ideas and publish their experiments in the open in real time, would force a revolution in the way that scientists work, the way that groups compete and collaborate, and the way that careers are evaluated and achievement rewarded; a revolution to the whole rhythm and pace of scientific discovery and the individual scientist's working life. A revolution that rather makes the whole issue of access to journal papers go away altogether.
Green OA advocates argue that open science, open data, and ONS are vague and fantastical distractions from the pressing matter of human access to journal articles; the we shouldn't waste time thinking about the former until we have solved the latter. I believe that green OA is a mundane and increasingly irrelevant distraction from the real problems and opportunities that are available for science to solve and grasp, but for a limited time only. The long-term achievements are too big to risk for the sake of such a small one.
Why spend time designing a better horse shoe when you could be inventing the railway train?

1 Jun 2010

Literature hacks: PubMed searches by RSS

This is a guest post by Joe Dunckley
There are all sorts of ways you could find out about new articles that you might want to read. There's that big room across campus that's full of old writings on paper, but that's too far away and they have some silly rule about not eating your lunch near their writings on paper, and anyway you're not sure you still have the card that lets you in. You can't trust your colleagues to point out an article that isn't crushingly mediocre, unless it's because it concerns a species or a disease whose name sounds mildly amusingly puerile, but those ones are never actually remotely related to your work. You subscribe to electronic tables of contents, but these days everyone's publishing in PLoS ONE, and you're not wading through their contents every week in the hope of finding the occasional thing that's relevant. You could regularly search PubMed, but that means typing in keywords over and over, and wading through the results asking yourself, "have I seen this paper already, or do I just feel like I've seen this paper already?"

So you could subscribe to email alerts for your PubMed searches, but my god, man, what the hell do you think you're doing? What, you haven't got enough email already? Make you feel special, having your phone stop you every five minutes with unimportant impersonal notifications? If it's not private, not time-critical, and does not require a reply, it should not be pestering you with an email. That article has taken ten years to get from concept to publication, it can wait a little longer for you to read it -- not that you even read more than one in every twenty of the articles you're alerted to.

Which is why it should be obvious to any of our readers why they should be using HubMed's RSS feeds of PubMed searches, with their Google Reader, to keep up with the literature. New articles will accumulate and be available to scroll through in the sophisticated and cleanly laid out environs of the Google Reader, when it's convenient for you to read them. Reader will tick off items that you've seen and present to you items that you haven't yet seen, without ever screaming "look at me, look at me right now!"

Update: Since I originally wrote this, PubMed released their major update, introducing their own implementation of RSS saved searches, which looks at least as good as that of HubMed, and takes less effort to set up -- just click the RSS button next to the search box on the search results page.