Open Access
Evidence Suggesting That a Fifth of Annotated Caenorhabditis elegans Genes May Be Pseudogenes
Author(s) -
Andrew Mounsey,
Petra Bauer,
Ian A. Hope
Publication year - 2002
Publication title -
genome research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 9.556
H-Index - 297
eISSN - 1549-5469
pISSN - 1088-9051
DOI - 10.1101/gr208802
Subject(s) - pseudogene , biology , caenorhabditis elegans , gene , genetics , genome , caenorhabditis , computational biology
Only a minority of the genes, identified in the Caenorhabditis elegans genome sequence data by computer analysis, have been characterized experimentally. We attempted to determine the expression patterns for a random sample of the annotated genes using reporter gene fusions. A low success rate was obtained for evolutionarily recently duplicated genes. Analysis of the data suggests that this is not due to conditional or low-level expression. The remaining explanation is that most of the annotated genes in the recently duplicated category are pseudogenes, a proportion corresponding to 20% of all of the annotated C. elegans genes. Further support for this surprisingly high figure was sought by comparing sequences for families of recently duplicated C. elegans genes. Although only a preliminary analysis, clear evidence for a gene having been recently inactivated by genetic drift was found for many genes in the recently duplicated category. At least 4% of the annotated C. elegans genes can be recognized as pseudogenes simply from closer inspection of the sequence data. Lessons learned in identifying pseudogenes in C. elegans could be of value in the annotation of the genomes of other species where, although there may be fewer pseudogenes, they may be harder to detect