Premium
CPAP : Cancer Panel Analysis Pipeline
Author(s) -
Huang PoJung,
Yeh YuanMing,
Gan RueiChi,
Lee ChiChing,
Chen TingWen,
Lee ChengYang,
Liu Hsuan,
Chen ShuJen,
Tang Petrus
Publication year - 2013
Publication title -
human mutation
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.981
H-Index - 162
eISSN - 1098-1004
pISSN - 1059-7794
DOI - 10.1002/humu.22386
Subject(s) - dbsnp , pipeline (software) , table (database) , computer science , computational biology , data mining , software , visualization , dna sequencing , biology , bioinformatics , single nucleotide polymorphism , genetics , gene , operating system , genotype
Targeted sequencing using next‐generation sequencing technologies is currently being rapidly adopted for clinical sequencing and cancer marker tests. However, no existing bioinformatics tool is available for the analysis and visualization of multiple targeted sequencing datasets. In the present study, we use cancer panel targeted sequencing datasets generated by the L ife T echnologies I on P ersonal G enome M achine S equencer as an example to illustrate how to develop an automated pipeline for the comparative analyses of multiple datasets. C ancer P anel A nalysis P ipeline ( CPAP ) uses standard output files from variant calling software to generate a distribution map of SNP s among all of the samples in a circular diagram generated by C ircos. The diagram is hyperlinked to a dynamic HTML table that allows the users to identify target SNP s by using different filters. CPAP also integrates additional information about the identified SNP s by linking to an integrated SQL database compiled from SNP ‐related databases, including db SNP , 1000 G enomes P roject, COSMIC , and db NSFP . CPAP only takes 17 min to complete a comparative analysis of 500 datasets. CPAP not only provides an automated platform for the analysis of multiple cancer panel datasets but can also serve as a model for any customized targeted sequencing project.