13 May 2010

Why you can't copy abstracts into Wikipedia

This is a guest post by Joe Dunckley
This is an archival repost of something first published elsewhere a year ago.

I am not a lawyer, but I do have six years experience of Wikipedia, was once a very prolific Wikipedian, and, despite my lack of activity there in more recent years, am apparently still an "admin" on the English language Wikipedia. This, coupled with working for an open-access publisher, means that I have also picked up a little knowledge of (mostly US & UK) copyright over the years. Since I can't boil all that down to just 250 characters (or whatever the limit is), this post serves to answer this question, raised at FriendFeed: 'Does an article in pubmed belong to the "legal public domain", can I copy and paste it in wikipedia?'

The answer is 'no'. I don't endorse this position, and I'm not trying to be a killjoy, but it is the correct answer nonetheless. Since there appears to be some confusion over why the answer is 'no', let me explain. First I'll define some terms, then the copyright status of journal abstracts, and finally why the policy of Wikipedia must be to exclude abstracts.

First the definitions. Don't quote me on these. Like I say, IANAL. These are all just definitions that I have picked up over the years in the context of Wikipedia and open-access. In order of increasing protection of rights:

  • Public domain: completely exempt from all rights given by copyright law. Anyone can reprint it, remix it, and make money selling it, with no obligations.
  • Public/copyleft licensed, e.g. GFDL, CC-BY: the producer of the work has asserted their ownership and claim their rights, but have voluntarily given everyone in the world permission to do certain things with the work without having to ask first. There are actually several tiers of these licenses.
  • Copyright: you're not allowed to do anything with the work, unless the copyright owner has said you can.
Public domain is not a synonym for "publicly available". Something is not "in the public domain" just because it is on the internet -- indeed, most of the internet is not public domain, it falls in that third category. There is no presumption that you are allowed copy and paste material all over the internets. Perhaps there are corners of the internet where that is de facto the case, and perhaps it would be great if everything was public domain or copyleft, but it's not. Napster was a place where music was de facto public domain, before the recording industry reminded them that the law doesn't work that way. However, there is a fourth area to copyright: fair use.

Fair use is not a fourth category, like the categories above. Fair use is just a set of exemptions to copyright protections. It allows you to make use of copyrighted material without the owner's permission to do so. However, it is very limited: you may only use a limited amount of the copyright material, and you can only do a limited range of things with it. If you want to use something copyrighted and say that you are doing so under fair use provisions, you have to make the case for your specific creation being fair use of the material. Getting away with claiming fair use for an abstract in PubMed does not mean that you will get away with it for Wikipedia, or some other creation. And the copyright owner is always within their rights to object to your fair use claim.

The copyright status of journal abstracts. Copyright to most journal abstracts will be owned by the journal's publisher (or society). Copyright to others will be owned by the authors. For open-access papers, the copyright is usually owned by the authors, but the journal has made sure that they have released it under a copyleft license, allowing you to do lots of things with their work. Papers written by employees of US federal agencies in the course of their employment will be public domain, as will very old papers.

It is true that there is a culture amongst scientists of free movement of published ideas. Copyright is worthless to a scientist, who actively wants his ideas to spread, so long as he is cited and acknowledged. Scientists freely share and reprint things like abstracts.

Scientists assume that publishers feel the same about all use of "their" material. Note the fierce and desperate opposition some of the traditional publishers raise against the open-access movement, though. Ideas mean different things to a scientist and to a (traditional) publisher. You shouldn't presume that publishers will react in the same laid-back way as scientists do when the words that "belong" to them are used in novel ways.

Note that PubMed carefully argues the case that its use of abstracts falls under fair use provisions. It doesn't just say "yeah, whatever, everyone freely reproduces abstracts, no one cares."

Why you can't turn abstracts into Wikipedia articles. Wikipedia can't be laid back about copyright any more than PubMed can. Wikipedia is now, what, a top-ten website by most metrics? People notice things that are put on Wikipedia. If you start putting abstracts on it, somewhere a publisher will notice, not like it, and have the material removed. You could claim fair use, but (and remember, IANAL), I very much doubt you would be successful: an encyclopedia is very different to an index, and in Wikipedia you are remixing the material. Well, whatever. One page gets deleted. No lasting harm done. End of story?

Not exactly. Wikipedia is GFDL. What you put on Wikipedia gets copied to hundreds of mirrors and put in paper versions. People use it for whatever commercial purposes they want, and it gets remixed to death. It's difficult to undo what goes into Wikipedia. That is why, when you write in Wikipedia, you must declare either that the words are your own, or that they are already released under a compatible copyleft license. You are not just giving permission for your words to be used on Wikipedia, you are giving permission for your words to be reused and remixed for virtually any purpose. This is why Wikipedia has to be pretty careful not to let copyright violations through.

This is also why Wikipedia does not actually allow any text to be contributed as fair use (except when marked as quotations): the permissions granted by Wikipedia are just too great for the fair use claim to be defensible.

But can't Wikipedia make an exception for abstracts? Theoretically, perhaps it could be done, but sadly, the reality is 'no'. Wikipedia is too big, too old, too well known, too bureaucratic. Wikipedia's policy on copyrights is well established; it must be generalist, covering all fields and all nations, and it can't afford to be lax. To come up with exceptions to the policy would be too difficult for such a generalist site with such a tiny legal team. The Wikipedians would have to establish beyond doubt that publishers were happy for their abstracts to be used not just on the encyclopedia, but by anyone, anywhere, for virtually any purpose, reprinted and remixed. And that sounds like the open-access movement to me.

Conclusions. You can't put (non-open access) abstracts on Wikipedia. It would be nice if the gentlemen's agreement whereby publishers overlooked reuse of their material by scientists extended to all spheres, but that ain't necessarily so. Of course, it would great if it were so, and the story is just one of thousands which emphasise the need for a more rational and restricted copyright system.


Kat Wentworth said...

I would like to post abstracts of scientific articles in an online bibliography. Is this legal?

Thank you for your help!

Onkar said...

can i copy whole wikipedia article or content for my blog posts ?

Eddie Howell said...
This comment has been removed by the author.
Mark said...

Much here very well-articulated -- albeit I'm in no condition, position myself to offer legal advice.

There's also much here I puzzled over. Key, most: Why/how all the distinctions between Pubmed and Wikipedia? Pubmed gets/asserts fair use yet Wikipedia NO because . . . ???

>"Ideas mean different things to a scientist and to a (traditional) publisher."

Copyright protects neither ideas nor facts to begin with -- only "expression."

>"Papers written by employees of US federal agencies in the course of their employment will be public domain . . . ."

what/why is NOT "n the course of their employment"? why not same rule for research publicly funded, tax-exempt charitable funded, papers from employees from other govt agencies?