Consensus Genotyper for Exome Sequencing (CGES): improving the quality of exome variant genotypes
Author(s) -
Vassily Trubetskoy,
Álex Rodríguez,
Uptal Dave,
Nicholas G. Campbell,
Emily L. Crawford,
Edwin H. Cook,
James S. Sutcliffe,
Ian Foster,
Ravi Madduri,
Nancy J. Cox,
Lea K. Davis
Publication year - 2014
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btu591
Subject(s) - exome , exome sequencing , dna sequencing , computer science , 1000 genomes project , ion semiconductor sequencing , computational biology , biology , genetics , mutation , genotype , gene , single nucleotide polymorphism
The development of cost-effective next-generation sequencing methods has spurred the development of high-throughput bioinformatics tools for detection of sequence variation. With many disparate variant-calling algorithms available, investigators must ask, 'Which method is best for my data?' Machine learning research has shown that so-called ensemble methods that combine the output of multiple models can dramatically improve classifier performance. Here we describe a novel variant-calling approach based on an ensemble of variant-calling algorithms, which we term the Consensus Genotyper for Exome Sequencing (CGES). CGES uses a two-stage voting scheme among four algorithm implementations. While our ensemble method can accept variants generated by any variant-calling algorithm, we used GATK2.8, SAMtools, FreeBayes and Atlas-SNP2 in building CGES because of their performance, widespread adoption and diverse but complementary algorithms.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom