Next-generation sequencing data analysis
Author(s) -
Christian T. K.H. Stadtländer
Publication year - 2017
Publication title -
briefings in bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.204
H-Index - 113
eISSN - 1477-4054
pISSN - 1467-5463
DOI - 10.1093/bib/bbx038
Subject(s) - computer science , dna sequencing , computational biology , biology , genetics , dna
Next-Generation Sequencing (NGS) has become one of the most important tools in the field of human genetics. Targeted resequencing of the coding part of the human genome (exome sequencing) has been performed on more than 4,500 samples from over 80 different projects in the course of this PhD project. The samples have been sequenced to identify pathogenic variants and disease associated genes in rare and common diseases. The aim of this PhD project was to investigate and develop methods and parameters to identify such pathogenic variants and genes from large amounts of exome sequencing data. An existing analysis pipeline has been modified on a large scale in order to reduce runtime, memory usage, required disk space and hands-on time, as well as to increase flexibility and allow easier adaptation and extension. Additionally, new features have been implemented to allow the analysis of other features of the data, such as Structural Variants (SVs) or Copy Number Variations (CNVs), and to allow multiple users to analyze large projects collaboratively. The data produced during this PhD project has been used to evaluate requirements on study design and certain key quality metrics of exome sequencing data. Several programs and strategies for variant calling have been benchmarked. Influences of different variant calling procedures and variant quality metrics on sensitivity and specificity have been evaluated and used to draw conclusions on best-practice variant calling. Additionaly, variant calling in RNA sequencing data for detection of RNA editing is discussed. Variant callers detect on average approximately 23,000 high quality coding variants per exome. Guidelines on filtering and selecting these variants in order to identify those that are disease causing, have been developed and are illustrated by examples, if applicable.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom