Premium
Prediction of protein solvent accessibility using support vector machines
Author(s) -
Yuan Zheng,
Burrage Kevin,
Mattick John S.
Publication year - 2002
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.10176
Subject(s) - support vector machine , sequence (biology) , sliding window protocol , computer science , kernel (algebra) , pattern recognition (psychology) , artificial intelligence , artificial neural network , regression , machine learning , bayesian probability , solvent exposure , data mining , biological system , window (computing) , solvent , mathematics , chemistry , statistics , operating system , biology , biochemistry , organic chemistry , combinatorics
A Support Vector Machine learning system has been trained to predict protein solvent accessibility from the primary structure. Different kernel functions and sliding window sizes have been explored to find how they affect the prediction performance. Using a cut‐off threshold of 15% that splits the dataset evenly (an equal number of exposed and buried residues), this method was able to achieve a prediction accuracy of 70.1% for single sequence input and 73.9% for multiple alignment sequence input, respectively. The prediction of three and more states of solvent accessibility was also studied and compared with other methods. The prediction accuracies are better than, or comparable to, those obtained by other methods such as neural networks, Bayesian classification, multiple linear regression, and information theory. In addition, our results further suggest that this system may be combined with other prediction methods to achieve more reliable results, and that the Support Vector Machine method is a very useful tool for biological sequence analysis. Proteins 2002;48:566–570. © 2002 Wiley‐Liss, Inc.