30 Jan 2007

Billions of genes, billions of articles?

We receive, and sometimes publish, manuscripts that "clone and characterize" a new gene in a species. The gene is usually identified from a cDNA library and is confirmed as present in the genomic DNA by PCR. The expression is measured using RT-PCR, and perhaps a phylogeny is drawn that relates this gene to others within the same species and to its relatives in other species.

There are something in the order of 1.5 million known species. A reasonable guess of the average number of genes per species is 5,000 (bacteria are in the range 500-4,000, eukaryotes 5,000-40,000). This gives the number of genes waiting to be cloned and characterized in known species alone at 7.5 billion, more than one for every woman, man and child alive today.

I think this illustrates quite well Mark Gerstein's recent comment in an article in BMC Bioinformatics that "academic journals alone cannot capture the findings of modern genome-scale inquiry". Philip Bourne has similarly said in the pages of PLoS Computational Biology that "Clearly, no one perceives a database entry of, say, a sequence, or a specimen in a museum collection, as being as valuable as the journal paper that describes it. But, ironically ... the database entry may indeed be more valuable".

