Premium
Contaminations in (meta)genome data: An open issue for the scientific community
Author(s) -
De Simone Giovanna,
Pasquadibisceglie Andrea,
Proietto Roberta,
Polticelli Fabio,
Aime Silvio,
J.M. Op den Camp Huub,
Ascenzi Paolo
Publication year - 2020
Publication title -
iubmb life
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.132
H-Index - 113
eISSN - 1521-6551
pISSN - 1521-6543
DOI - 10.1002/iub.2216
Subject(s) - computational biology , correctness , genome , dna sequencing , computer science , metagenomics , data mining , data science , biology , genetics , dna , gene , programming language
In recent years, the high throughput and the low cost of next‐generation sequencing (NGS) technologies have led to an increase of the amount of (meta)genomic data, revolutionizing genomic research studies. However, the quality of sequencing data could be affected by experimental errors derived from defective methods and protocols. This represents a serious problem for the scientific community with a negative impact on the correctness of studies that involve genomic sequence analysis. As a countermeasure, several alignment and taxonomic classification tools have been developed to uncover and correct errors. In this critical review some of these integrated software tools and pipelines used to detect contaminations in reference genome databases and sequenced samples are reported. In particular, case studies of bacterial contaminations, contaminations of human origin, mitochondrial contaminations of ancient DNA, and cross contaminations are examined.