
DNN-Boost: Somatic mutation identification of tumor-only whole-exome sequencing data using deep neural network and XGBoost
Author(s) -
Firda Aminy Maruf,
Rian Putra Pratama,
Giltae Song
Publication year - 2021
Publication title -
journal of bioinformatics and computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.339
H-Index - 43
eISSN - 1757-6334
pISSN - 0219-7200
DOI - 10.1142/s0219720021400175
Subject(s) - exome sequencing , overfitting , exome , mutation , artificial intelligence , computer science , benchmark (surveying) , artificial neural network , biology , computational biology , genetics , gene , geodesy , geography
Detection of somatic mutation in whole-exome sequencing data can help elucidate the mechanism of tumor progression. Most computational approaches require exome sequencing for both tumor and normal samples. However, it is more common to sequence exomes for tumor samples only without the paired normal samples. To include these types of data for extensive studies on the process of tumorigenesis, it is necessary to develop an approach for identifying somatic mutations using tumor exome sequencing data only. In this study, we designed a machine learning approach using Deep Neural Network (DNN) and XGBoost to identify somatic mutations in tumor-only exome sequencing data and we integrated this into a pipeline called DNN-Boost. The XGBoost algorithm is used to extract the features from the results of variant callers and these features are then fed into the DNN model as input. The XGBoost algorithm resolves issues of missing values and overfitting. We evaluated our proposed model and compared its performance with other existing benchmark methods. We noted that the DNN-Boost classification model outperformed the benchmark method in classifying somatic mutations from paired tumor-normal exome data and tumor-only exome data.