Curation at the NCBI: Genomes, Genes, & Sequence Standards
Author(s) -
Garth Brown,
Catherine M. Farrell,
Jennifer Hart,
Melissa Landrum,
Donna Maglott,
B. Maidak,
Michael R. Murphy,
Terence D. Murphy,
Bhanu Rajput,
Kim D. Pruitt,
Lillian D. Riddick,
David Webb,
Janet A. Weber,
Wendy Wu
Publication year - 2009
Publication title -
nature precedings
Language(s) - English
Resource type - Journals
ISSN - 1756-0357
DOI - 10.1038/npre.2009.3287.1
Subject(s) - refseq , ensembl , data curation , genome project , annotation , gene nomenclature , genome , genome browser , reference genome , gene annotation , computational biology , gene , world wide web , biology , genetics , genomics , computer science , nomenclature , taxonomy (biology) , botany
The National Center for Biotechnology Information (NCBI) provides curation support for many genomes, and disseminates information in several resources including Entrez Gene, reference sequences (RefSeq), the Consensus CDS (CCDS) database, and the Genome Reference Consortium (GRC). These projects are supported by several collaborations to provide:1) support to the international consortium maintaining the assemblies for human and mouse (GRC); 2) sequence standards for chromosomes, genes, transcripts and proteins (RefSeq); 3) reports of integrated information including nomenclature, publications, phenotypes and diseases, sequences, ontologies, interactions (Gene); and 4) identification of proteins that are consistently annotated on the human and mouse reference genomes, and consistently updated by collaborating members (CCDS).
NCBI curation of any one data type (e.g., a gene) is closely integrated with evaluation of the genome assembly, and determining annotation by way of RefSeq transcript and protein sequences. Database and work-flow infrastructure is designed to support reporting and tracking issues with the assembly, gene, or evidence data to collaborating groups, and to support collaborative review and discussions of issues that arise. Curation depends on publicly available information to represent the gene extent, alternatively spliced transcripts, and protein isoforms. Scientific consults occur regularly and wet-bench validation needs are supported by some of the collaborations. Curation of genome annotation results in improved data presentation at the three major genome browser sites (Ensembl, NCBI, UCSC) and has resulted in efforts to define common curation guidelines to maximize consistency and minimize conflicts.
The presentation focuses on curation of the human genome, genes, and RefSeq sequence standards
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom