
Sequence-based predictive modeling to identify cancerlectins
Author(s) -
Hoang M. Lai,
Xinxin Chen,
Wei Chen,
Hua Tang,
Hao Lin
Publication year - 2017
Publication title -
oncotarget
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.373
H-Index - 127
ISSN - 1949-2553
DOI - 10.18632/oncotarget.15963
Subject(s) - jackknife resampling , benchmark (surveying) , support vector machine , computational biology , feature (linguistics) , computer science , artificial intelligence , sequence (biology) , machine learning , bioinformatics , pattern recognition (psychology) , biology , mathematics , statistics , genetics , linguistics , philosophy , geodesy , estimator , geography
Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools.