Premium
Predicting the change of exon splicing caused by genetic variant using support vector regression
Author(s) -
Chen Ken,
Lu Yutong,
Zhao Huiying,
Yang Yuedong
Publication year - 2019
Publication title -
human mutation
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.981
H-Index - 162
eISSN - 1098-1004
pISSN - 1059-7794
DOI - 10.1002/humu.23785
Subject(s) - rna splicing , biology , support vector machine , exon , robustness (evolution) , computational biology , regression , feature selection , genetics , pearson product moment correlation coefficient , machine learning , artificial intelligence , gene , computer science , statistics , mathematics , rna
Alternative splicing can be disrupted by genetic variants that are related to diseases like cancers. Discovering the influence of genetic variations on the alternative splicing will improve the understanding of the pathogenesis of variants. Here, we developed a new approach, PredPSI-SVR to predict the impact of variants on exon skipping events by using the support vector regression. From the sequence of a particular exon and its flanking regions, 42 comprehensive features related to splicing events were extracted. By using a greedy feature selection algorithm, we found eight features contributing most to the prediction. The trained model achieved a Pearson correlation coefficient (PCC) of 0.570 in the 10-fold cross-validation based on the training data set provided by the "vex-seq" challenge of the 5th Critical Assessment of Genome Interpretation. In the blind test also held by the challenge, our prediction ranked the 2nd with a PCC of 0.566 that demonstrates the robustness of our method. A further test indicated that the PredPSI-SVR is helpful in prioritizing deleterious synonymous mutations. The method is available on https://github.com/chenkenbio/PredPSI-SVR.