30 Aug 2010

What is the scientific paper? 2: What's wrong?

This is a guest post by Joe Dunckley
Once again, this is a re-post of something I wrote on my old blog a year ago after the Science Online conference, looking at the future of the scientific paper. As I reminded people at the time, these were just my own half-thought through ideas, not the policy or manifesto of anyone or anything I'm affiliated with.
So in response to the Science Online conference, we've been thinking about the question, "what is the scientific paper?" I already gave my answer to that a couple of weeks ago, but promised to have a go at answering the more interesting question, "what is wrong with the scientific paper?"
I've been thinking through how to sum up the answer all week, and I'm afraid the simple answer is, "the journal". The journal is what's wrong with the scientific paper. Or rather, the journal is what is holding back the development of efficient modern methods of disseminating science. So I thought I'd spend this second post making some observations on what the scientific journal traditionally is and does; what I think the modern journal shouldn't be doing; and a couple of case studies of alternative technologies that disseminate certain kinds of scientific communications better than a journal ever could.

What is the (traditional) scientific journal?
  • The journal is a collection of scientific papers limited to some kind of theme coherent enough to make it worth reading buying.
  • The journal is led by a charismatic editor-in-chief and editorial board who attract people to publish in the journal.
  • The journal is printed on pages. It can do text, still pictures, graphs, and small tables.
  • The journal publishes a sufficiently large number of papers to make it worth printing several issues each year, but a sufficiently small number of papers to make each issue manageable.
  • The purpose of the journal is to be read and cited by other scientists.
  • The purpose of the journal is to be purchased by university libraries.
  • The journal provides a peer-review, copy-editing, marketing and media relations service to their scientists.
  • Publishing in a journal provides a way for scientists to be cited and credited for their work, based on the reputation of that journal.
  • The journal decentralises scientific publishing, allowing individual pockets of innovation within the publishing world, but making change overall very slow.
What should the modern journal (not) be doing?
It is perhaps rather foolish for somebody who works for a publisher of journals -- who works developing technologies for a publisher of journals -- to say that the problem with publishing science is the journal. It would be even more foolish for me to say that publishers perhaps shouldn't be trying to fix the problem with technology. Here are a couple of interesting technological advances that the more forward thinking journals have come up with lately.
  • At Sci Online, Theo Bloom demonstrated iSee, a structural biology visualisation applet for your "supplementary information". In the same category is J. Cell Biol's DataViewer, which is presented to us as a device for visualising raw microscopy data. Did you know that the results that come out of modern microscopes are not just pretty static pictures, but vast datasets full of hidden information? The JCB DataViewer unlocks that hidden information, by providing it and an interface to it as "supplementary information" with a paper.
  • PLoS Currents: all the constraints and benefits of a traditional journal, but without the peer-review. Solves the problem of delays in publication. Publishes items that look just like the traditional paper.
Should publishers and journals be doing these things? When you look more closely at JCB's DataViewer, you find that, useful though it may be, most of its power and potential is currently wasted. The DataViewer is presented to us as a device for visualising the supplementary information of a paper; in fact, it is a potentially important database of microscopy datasets with a handy graphical interface attached. Restricted to a single journal, the database functionality lays unused.
PLoS Currents? This is supposed to be a solution to the problem of delays in publishing special types of science deemed to be important and timely enough to need rapid communication to peers in the field. What have PLoS done? What makes PLoS Currents unique? How does it speed up intra-field communication of those important results? It drops one single aspect of the paper: peer review. In all other respects, PLoS Currents does all it can to make its papers look like the scientific paper, and its "journal" look like the scientific journal. Scientists are still asked to spend hours writing up these important timely results, with an abstract, introduction, methods, results, conclusions and references, with select figures and graphs and tables. Nobody has the imagination to go beyond the paper-journal-publisher model. We would sooner give up peer review than publish science in anything that doesn't look like papers have looked for a century.
Or how about Journal of Visualised Experiments? JOVE is, for some inexplicable reason, held up as a brilliant example of innovation in publishing science -- of making the most of the new technology provided by the web. Those who point out that, well, it's not really a "journal", is it?, are chastised for their own lack of imagination. But surely it's those who can't conceive of a publishing format branded as anything other than the "Journal of ..." who are lacking the imagination.
Final example: while thinking about this post, PLoS Computational Biology kindly came up with the absurd idea of being a software repository. NO! Software repositories already make perfectly good software repositories, and there are plenty of them. Trying to turn a journal into a software repository is a suboptimal solution to a problem that disappeared long ago -- long before scientific publishers could have imagined that the problem even existed.
Breaking out of the journal
The web makes all sorts of new methods of publishing, communicating, disseminating science possible. It also comes with all sorts of well developed and widely used solutions to the problems of disseminating science. The big old publishers haven't even realised the web has happened, let alone thought about what to do with it. The hip young publishers know what's possible, and they want to be the ones to realise the possibilities. Good on the hip young publishers. But with each new possibility, scientists should be asking whether publishers, even the hip young ones, are really right for the job. Sometimes they are. Sometimes not.
GenBank, the database of gene sequences and genome projects, had to happen. Journals simply can't publish the raw results from a whole genome sequencing project. (Thought I don't suppose they gave up without trying.) And GenBank comes with dozens of benefits that papers, when spread across a decentralised system of journals, just can't have. Yes, I know that databases aren't the optimal solution for every variety of data, but they are suitable -- desirable; even required -- for more of them than you might think. The microscopy data in JCB dataviewer (or the structural data in iSee) would, I suspect, be of much greater value were it branded as a standalone public database with a fancy front-end, than as a fancy visualisation applet for some scattered and hidden supplementary files, restricted to a single journal.
Like it or not, science increasingly depends on data being published in public machine readable formats. Those who spend their days looking one-at-a-time at the elements of a single cell signalling pathway in every tumour cell line available to them are wasting our money if they bury their data in a fragmented and closed publication record. Nobody reads those papers, and the individual fragments of data don't tell us anything. Journal publishers think they can ensure that data is correctly published, but so far their only great successes are with the likes of GenBank and MIAME, where journals have ensured that data be deposited in public databases outside of the journal format.
ArXiV. Does this need any explanation? What does PLoS Currents offer that isn't already solved better by pre-print servers? Just a brand name that makes it look as though it's a journal. If you require rapid dissemination of important timely results and you want to go to the effort of writing a full traditional scientific paper, put it on a pre-print server while it's going through peer review in a real journal. Don't just abandon peer review while making it look like you've just published a real paper in a real journal.
Better yet, don't write a proper traditional paper. If you need rapid communication of important timely results, why waste time with all of the irrelevant trimmings of a scientific paper? The in-depth background and discussion and that list of a hundred references. Put these critical results on a blog with a few lines of explanation, and later submit the full paper for peer review in a real journal.
Credit where it's due
All the real scientists reading -- the ones looking for jobs and grants and promotion and tenure -- have spotted the one great big flaw in all these suggestions: credit. At least a paper in PLoS Currents can be listed in a CV. Nobody even reads blogs, let alone cites them. How can you get a grant on the back of a blog post? Am I suggesting you should be able to get a grant on the back of a blog post?
Maybe. I don't know. I don't think so. At the moment, publishing papers in journals is pretty much all a researcher can get any credit for. Asking researchers to go beyond the paper-in-journal format is going to create problems of assigning credit, and I don't know exactly what the solution to that problem might be. Simply, I haven't put much effort into considering solutions. I'm a consumer rather than creator of science, so that particular problem doesn't keep me awake at night. But there surely are solutions -- plenty of them.
Fact is, it's quite obvious to anyone in or observing science that the current method of ensuring that scientists are credited for their hard work is really quite broken. Trying to cram every new kind of "stuff" into that broken system is hardly helping.
Business models
Meanwhile, the publishers will be asking how we see the business models for these non-journal based methods of publishing working. Frankly, I'm not really interested. But then, JOVE is hardly the beacon of business success anyway. If publishers want science publishing to be a business, they need to find the new business models that work without strangling science. Otherwise, they're liable to find out that, on the web, some institutions and individual scientists can do a better job of disseminating science than the professionals can, and out of their own pocket.
The paper of the future
I don't necessarily think that anybody should stop writing papers -- perhaps not even the ones that nobody reads. The paper solves several problems better than any other proposed solution. A peer reviewed scientific paper, in a journal if you like, is as good a way as any to provide a permanent record of a unit of science done, and of a research group's interpretation of the significance of that unit of science. And it needn't change all that much. Making them shorter and a lot less waffley would be to my taste -- there's no need to put that much effort into words that won't be read. And give them semantic markup, animations, and comment threads, if you like. But don't pretend that those things are anything more than incremental advances. The real revolutions in the dissemination of science can only occur beyond the shackles of the traditional paper and journal. Every new Journal of Stuff is another step back.
Updates for 2010
Peter Murray Rust has been saying interesting things about domain-specific data repositories, which I am sure are worth paying more attention to than I have yet had time to.
When I originally posted this, I was challenged for not mentioning the problem of closed-access journals at all; that problem is addressed in the subsequent posts.

17 Aug 2010

What is the scientific paper? 1: Observations

This is a guest post by Joe Dunckley
Last year, after Science Online, I wrote a series of posts inspired by Ian Mulvany's question, what is the scientific paper? Those were originally posted on my old blog; now, with SoLo approaching once again, seems like a good time to revisit them, while migrating them over to Journalology.
Science Online charged us with answering the question, what is the scientific paper? Here is the answer. It comes from the perspective of somebody who has been middle author on just two, but who has spent a little bit of time working with them and with people who think a lot about them.
What does the scientific paper look like?

  • It's a few thousand words -- probably between 4 and 15 pages long (but can be <1 >100 pages).
  • It's mostly prose text, with a little bit of graphs, tables, and pictures.
  • It has a set matter-of-fact style and structure.
  • It's written in (American) English.1
What is in the scientific paper?
  • Who did the science.
  • Why the science was done.
  • How the science was done.
  • Data!
  • The authors' interpretation of what was achieved by doing the science.
  • Pointers to the other bits of science mentioned.
Where is the scientific paper?
  • It is in a journal, available in one or both of:
    • printed on 4-15 sheets of dead trees, between a pair of glossy (or not so glossy) covers in the basement of a library.
    • a journal website, possibly with technology deliberately designed to make it difficult and expensive to get to, probably only available in a clunky and poorly designed PDF file.
  • It might also be in-part or in-full in a searchable database, like PubMed.
  • If you're really lucky, it is available as HTML and XML.
What is the scientific paper for?
  • It aims to be a complete, objective, reliable, and permanent record of a unit of science done.
  • It's a way of telling your field what you've done.
  • It's a way of telling your field what you've found.
  • It's a way of giving data and resources to your field.
  • It's a (the?) way of proving to your (potential) employer/funder that you have done something worthwhile.
  • It's a way of making money for publishers
How is the scientific paper made?
  • The authors are given some money and lab space on the condition that they use it to do some science and write a paper about it.2
  • The authors do some science and write a paper about it.
  • They give it to a journal. The journal thinks about it.
  • Peer review! Months of scrutiny, discussion, and revisions.
  • Production! The words are turned into PDFs and printed pages.
What is the scientific paper not?
  • Part of a conversation.
  • Quick and efficient.
  • Diverse and flexible.
  • Possible to edit after acceptance by the journal (except in extreme circumstances, and via slow and unsatisfactory mechanisms).
  • Possible to edit by anybody except "the authors".
  • A way of making your data and resources reusable.
  • A way of telling the layperson what you've done and found.
Wait, that wasn't really what the question meant, you say? Well, indeed. But before we get to the real questions -- "what's wrong with the scientific paper?" and "what do you suppose we do about that?" -- it's good to define some terms and lay out the basics. Do you think I've got any of my observations wrong, or think I've overlooked some important property of the scientific paper? Do say -- it would be good to try to agree on what the paper is before going any further.
Footnotes
  1. Thanks to Hannah who added this point in the comments on the old blog
  2. Thanks to Cameron Neylon, ditto

Incentivising academic fraud

This is a guest post by Joe Dunckley
Catching up with the newsfeeds after a week working in Beijing (where citizens are saved from reading such subversive content as Journalology -- as they are all Blogspot blogs), I notice the Economist discussing academic fraud in China.

Being the Economist, it attempts to explain China's fraud epidemic focus on incentives:

China may be susceptible, suggests Dr Cong Cao, a specialist on the sociology of science in China at the State University of New York, because academics expect to advance according to the number, not the quality, of their published works. Thus reward can come without academic rigour. Nor do senior scientists, who are rarely punished for fraud, set a decent example to their juniors.
The trouble with this explanation is that these same incentives apply in many -- most -- other countries also. Science everywhere is plagued by the publish-or-perish game and the incentives it generates. Academic careers stand and fall on the basis of publication counts. Some countries at least try to judge quality of output in addition to quantity, but most methods are no more sophisticated than that used by China -- and every method has its incentives for fraud.

Nor does a lack of disincentives in China explain why they stand out. Fraud is rarely satisfactorily punished anywhere. If it is even discovered at all, the photoshopped figures and made-up numbers become an accident; the original data was lost sometime after that project was completed; the grad student who handled that particular experiment has moved on, and can no longer be contacted. A researcher getting fired for fraud is big news, not because fraud is rare, but because failing to weasel out of an allegation is rare.

It is my fear that China is perceived as having a higher rate of fraud compared to other countries not because it does, but because Chinese researchers aren't very good at it yet. Their fiddled figures are crude and easily spotted; their fictitious facts are amateur inventions that can not be believed. The worrying thing about these rough and unrefined fabrications is not that they themselves, easily found out and struck from the record, exist. The worrying fact is that they must be the tip of a great iceberg; 99% of the fakes are unseen, produced by forgers skilled enough to mask their work in convincing disguises and cover their tracks perfectly. As science in China matures, and the student to supervisor ratio falls and natural selection picks the cleverest conmen, the epidemic of clumsy and primitive fraud will end. That's when China joins the ranks of countries experiencing advanced and undetectable fraud epidemics.

Discussing fraud as a symptom of a Chinese problem -- of a failure of Chinese academic administration or a flaw in the Chinese culture and psyche -- is a nice distraction from the uncomfortable fact that fraud is a symptom of a global problem -- of failing academic administration everywhere. The Chinese copied the publish-or-perish game from the west. Soon they'll get good at it.

10 Aug 2010

New word - evoluating

"Evoluating". It's probably an attempt to use the French "évoluer" in English, I think it means "evolving".

6 Aug 2010

The Scientist has an attack of CNS disease

The Scientist this week tells us that
"Peer review isn’t perfect [who knew?]— meet 5 high-impact papers that should have ended up in bigger journals."
Wait, what? These high-impact papers got those citations despite ending up in "second tier" journals, so I doubt the authors have been crying into their beer about this "injustice". This is an example of CNS Disease, a term coined by Harold Varmus to characterise the obsession with Cell, Nature and Science. Not all high-impact papers must published in one of these journals, and not all papers published in these journals will be high impact. Biomedical publishing is not just a game in which editors sort articles by predicted future impact - at least, I hope it's not.

Authors chose their publication venue for all sorts of reasons, and it's hard to predict which new work will set the world on fire. Take BLAST - it was a "quick and dirty" algorithm that gave similar results to the Smith and Waterman algorithm only much faster, and the gain in speed came at a loss of accuracy. Only use by scientists in practice could decide whether this was a good approach. Focussing on the umpteen thousand citations to BLAST is missing the point: the important thing about BLAST is the millions or billions of hours of computer time saved by using it. As Joe, the other denizen of Journalology Towers, said recently:
"Lord protect us from the idea that an academic publication might have any value beyond its ability to accumulate citations."