Premium
Development of a Python‐Based Algorithm for Comparative Analysis of Multiparticipant Next Generation Sequencing Data
Author(s) -
Martz Flora G.,
Forst Thomas M.,
Ryan Sean M.,
Murphy Patrick J. M.
Publication year - 2016
Publication title -
the faseb journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.709
H-Index - 277
eISSN - 1530-6860
pISSN - 0892-6638
DOI - 10.1096/fasebj.30.1_supplement.629.3
Subject(s) - computer science , python (programming language) , exome , source code , computational biology , exome sequencing , genetics , programming language , biology , phenotype , gene
The goal of this study was to develop a computational approach using the Python programming language to identify the extent to which genetic variation, and single nucleotide polymorphisms (SNPs) in particular, identified by whole exome sequencing correlate with previously observed phenotypic variations. Using previously obtained whole exome data from otherwise healthy volunteers possessing extreme glucocorticoid‐response phenotypes, an open source algorithm was developed. Python was selected as the preferred programming language for this purpose because of its cross‐platform operability, open source code, and precedent for managing omics‐level biological data. The program was designed to parse multiple participants’ whole exome sequencing files in sequence and allow for the selection of user‐defined filtering parameters to generate comparative analyses of polymorphisms unique to, or shared among, participants of known phenotype. In addition, a proofreading mechanism and basic user interface were incorporated into the code to enhance the scope and capability of the program. Further analytical parameters include data selection based mutation type, significance of mutation, zygosity, sequence coverage, confidence interval, gene/transcript information, and coding significance. Creation of this program allowed for enhanced analytical evaluation of systems‐wide observations regarding phenotypic variance.