z-logo
Premium
Application of variant‐calling algorithms for Mendelian disorders: lessons from whole‐exome sequencing in Charcot–Marie–Tooth disease
Author(s) -
Hong Y.B.,
Jung J.,
Jung S.C.,
Chung K.W.,
Choi B.O.
Publication year - 2014
Publication title -
clinical genetics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.543
H-Index - 102
eISSN - 1399-0004
pISSN - 0009-9163
DOI - 10.1111/cge.12281
Subject(s) - exome sequencing , dbsnp , genetics , dna sequencing , snp , biology , single nucleotide polymorphism , algorithm , computational biology , genotype , mutation , computer science , gene
To the Editor : In contrast with traditional genetic approaches for the isolation of disease-causing mutations, which have taken considerable time and effort, recent advent of next-generation sequencing (NGS) has broadened and accelerated the genetic analysis of various disorders. For example, a mutation in the TRK-fused gene, which causes hereditary motor and sensory neuropathy with proximal dominance, eluded isolation for 15 years, until the application of NGS (1, 2). In addition, the application of whole-exome sequencing (WES) to clinically and genetically heterogeneous diseases has been effective in unveiling their genetic causes. Currently, there are numerous single-nucleotide polymorphism (SNP)-calling algorithms based on genotype likelihood, Phred score-based calling, and simple read error rate (3, 4). In addition, a systematic assessment of SNP-calling methods using various measurements of false-positive SNP calls was reported: transition/transversion ratio, dbSNP rate, and non-reference discrepancy rate. However, these algorithms exhibit low sensitivity for low-coverage data, as the outcomes tend to vary according to the algorithms. In addition to the accuracy of the called variants, the total number of candidates is also critical to researchers. Empirically, we could obtain 500–1000 variants per sample during the isolation of rare variants. Additional analyses of three family members may reduce the candidates to less than 10, which are still too many to be verified experimentally. Thus, the successful identification of the genetic cause using WES alone requires additional strategies. Here, we attempted to find a solution from variant-calling algorithms by analyzing retrospectively the performance of two most popular calling algorithms, SAMtools mpileup and gatk Unified Genotyper, during the isolation of the genetic causes of Charcot–Marie–Tooth (CMT) disease. We performed WES in 244 individuals from 161 families. The whole exome was captured using the SeqCap EZ Human Exome Library (v2.0 or v3.0; Roche NimbleGen, Madison, WI) and NGS was performed using a Solexa GAIIx or a HiSeq 2000 Genome Analyzer (Illumina, San Diego, CA). Read mapping to UCSC hg19 was performed by bwa, followed by indexing and variant calling using SAMtools (Phred quality score of 3). Using these settings, we identified genetic causes of CMT in 77 families (107 samples). For cases with unidentified causes, we reanalyzed the data using gatk (Phred quality score of 10) and could identify only two additional genetic causes. Thus, the performance of both algorithms in the identification of CMT-associated mutations was similar. Nonetheless, there were significant differences in variant calling between the two algorithms. The numbers of total and functionally significant variants, which affect the primary structure of proteins, were greater in gatk than SAMtools (Fig. 1a,b). According to our experience, most causative SNPs in CMT are unreported in dbSNP135. Intriguingly, the ratio of concurrently called unreported variants was reduced significantly compared with those of total or functionally significant variants (Fig. 1c). These data suggest that there might be many false positives among the candidate variants. To address this concern further, we traced back to the initial steps of the analysis. First, we investigated whether gatk called the 77 causative variants identified by SAMtools. Including the two causative variants called by gatk only, 112 causative variants (including several pairs of autosomal-recessive alleles and identical causative variants repetitively called from family members) were called, 98% of which were called by both algorithms (Fig. 1d). Next, we investigated the false-positive ratio. During the confirmation of the candidate variants, we performed capillary sequencing for 489 variants. The overall false-positive ratio (15.9%) was not as high as expected; however, 79.5% or 65.8% of variants called by SAMtools or gatk alone were false positives, whereas the ratio was only 4.4% in concurrently called variants (Fig. 1e). False-positive ratio was slightly higher (24.1%) in indel variants compared to single-nucleotide variant (15.4%). Therefore, these data suggest that the selection of variants called by both algorithms reduces the false-positive ratio. The major obstacle to the determination of causative variants in small pedigrees with rare Mendelian traits is

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here