
Prediction of human disease‐associated phosphorylation sites with combined feature selection approach and support vector machine
Author(s) -
Xu Xiaoyi,
Li Ao,
Wang Minghui
Publication year - 2015
Publication title -
iet systems biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.367
H-Index - 50
eISSN - 1751-8857
pISSN - 1751-8849
DOI - 10.1049/iet-syb.2014.0051
Subject(s) - support vector machine , feature selection , phosphorylation , random forest , redundancy (engineering) , computer science , machine learning , artificial intelligence , data mining , computational biology , selection (genetic algorithm) , naive bayes classifier , bioinformatics , biology , genetics , operating system
Phosphorylation is a crucial post‐translational modification, which regulates almost all cellular processes in life. It has long been recognised that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease‐associated phosphorylation sites prediction. Herein, for the first time the authors propose a novel approach that is specially designed to identify associations between phosphorylation sites and human diseases. To take full advantage of local sequence information, a combined feature selection method‐based support vector machine (CFS‐SVM) that incorporates minimum‐redundancy‐maximum‐relevance filtering process and forward feature selection process is developed. Performance evaluation shows that CFS‐SVM is significantly better than the widely used classifiers including Bayesian decision theory, k nearest neighbour and random forest. With the extremely high specificity of 99%, CFS‐SVM can still achieve a high sensitivity. Besides, tests on extra data confirm the effectiveness and general applicability of CFS‐SVM approach on a variety of diseases. Finally, the analysis of selected features and corresponding kinases also help the understanding of the potential mechanism of disease‐phosphorylation relationships and guide further experimental validations.