
Rapid automated validation, annotation and publication of SARS-CoV-2 sequences to GenBank
Author(s) -
Beverly A. Underwood,
Linda Yankie,
Eric P. Nawrocki,
Vasuki Palanigobu,
Sergiy Gotvyanskyy,
Vincent C. Calhoun,
Michael Kornbluh,
Thomas Smith,
Lydia Fleischmann,
Denis Sinyakov,
Colleen J Bollin,
Ilene Karsch-Mizrachi
Publication year - 2022
Publication title -
database
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.406
H-Index - 62
ISSN - 1758-0463
DOI - 10.1093/database/baac006
Subject(s) - genbank , annotation , genome , computer science , usability , pandemic , computational biology , biology , covid-19 , genetics , infectious disease (medical specialty) , medicine , artificial intelligence , gene , disease , human–computer interaction , pathology
Rapid response to the current coronavirus disease 2019 (COVID-19) pandemic requires fast dissemination of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic sequence data in order to align diagnostic tests and vaccines with the natural evolution of the virus as it spreads through the world. To facilitate this, the National Library of Medicine's National Center for Biotechnology Information developed an automated pipeline for the deposition and quick processing of SARS-CoV-2 genome assemblies into GenBank for the user community. The pipeline ensures the collection of contextual information about the virus source, assesses sequence quality and annotates descriptive biological features, such as protein-coding regions and mature peptides. The process promotes standardized nomenclature and creates and publishes fully processed GenBank files within minutes of deposition. The software has processed and published 982 454 annotated SARS-CoV-2 sequences, as of 21 October 2021. This development addresses the needs of the scientific community as the sequencing of SARS-CoV-2 genomes increases and will facilitate unrestricted access to and usability of SARS-CoV-2 genomic sequence data, providing important reagents for scientific and public health activities in response to the COVID-19 pandemic. Database URL https://submit.ncbi.nlm.nih.gov/sarscov2/genbank/.