Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences
Author(s) -
Yaoxin Wang,
Yingjie Xu,
Zhenyu Yang,
Xiaoqing Liu,
Qi Dai
Publication year - 2021
Publication title -
computational and mathematical methods in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.462
H-Index - 48
eISSN - 1748-6718
pISSN - 1748-670X
DOI - 10.1155/2021/5529389
Subject(s) - random forest , feature selection , computer science , redundancy (engineering) , feature (linguistics) , class (philosophy) , selection (genetic algorithm) , similarity (geometry) , artificial intelligence , machine learning , structural similarity , pattern recognition (psychology) , data mining , protein structure prediction , protein structure , biology , philosophy , linguistics , biochemistry , image (mathematics) , operating system
Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom