
Combination between DE and SVM to enhance Protein Structure Prediction based on Secondary Structural information
Author(s) -
Thair A. Kadhim,
Mohammed Hasan Aldulaimi,
Suhaila Zainudin,
Azuraliza Abu Bakar
Publication year - 2019
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v8i4.19619
Subject(s) - support vector machine , artificial intelligence , pattern recognition (psychology) , computer science , feature selection , classifier (uml) , data mining , machine learning , similarity (geometry) , image (mathematics)
The effective selection of protein features and the accurate method for predicting protein structural class (PSP) is an important aspect in protein folding, especially for low-similarity sequences. Many promising approaches are proposed to solve this problem, mostly via computational intelligence methods. One of the main aspect of the prediction is the extraction of an excellent representation of a protein sequence. An integrated vector of dimensions 71 was extracted using secondary and hydropathy information in this study Using newly developed strategies for categorizing proteins into their respective main structures classes, which are all-α, all-β, α/β, and α+β. Support Vector Machine (SVM) and Differential Evolution (DE) were combined using the wrapper method to select the top N features based on the level of their respective importance. The classification can be made more accurate by tuning the kernel parameters for the SVM in the training phase. In this study, the mean of the classification rate from using the SVM classifier was used to evaluate the selected subset of features. This study was tested using two low - similarity data sets (D640 and ASTRAL). A comparison between the proposed (SVM + DE) based on DE feature selection approach and (SVM+DE) based on grid search (a traditional method to search for parameters) forms the core of this work. The proposed SVM+DE model is competitive and highly reliable in terms of time and performance accuracy compared with other reported methods in literature.