Premium
Cross‐validation of protein structural class prediction using statistical clustering and neural networks
Author(s) -
Metfessel Brent A.,
Connelly Donald P.,
Rich Stephen S.,
Saurugger Peter N.
Publication year - 1993
Publication title -
protein science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.353
H-Index - 175
eISSN - 1469-896X
pISSN - 0961-8368
DOI - 10.1002/pro.5560020712
Subject(s) - cluster analysis , artificial neural network , vector quantization , pattern recognition (psychology) , learning vector quantization , artificial intelligence , computer science , protein sequencing , algorithm , machine learning , peptide sequence , chemistry , biochemistry , gene
We present an approach to predicting protein structural class that uses amino acid composition and hydrophobic pattern frequency information as input to two types of neural networks: (1) a three‐layer back‐propagation network and (2) a learning vector quantization network. The results of these methods are compared to those obtained from a modified Euclidean statistical clustering algorithm. The protein sequence data used to drive these algorithms consist of the normalized frequency of up to 20 amino acid types and six hydrophobic amino acid patterns. From these frequency values the structural class predictions for each protein (all‐alpha, all‐beta, or alpha‐beta classes) are derived. Examples consisting of 64 previously classified proteins were randomly divided into multiple training (56 proteins) and test (8 proteins) sets. The best performing algorithm on the test sets was the learning vector quantization network using 17 inputs, obtaining a prediction accuracy of 80.2%. The Matthews correlation coefficients are statistically significant for all algorithms and all structural classes. The differences between algorithms are in general not statistically significant. These results show that information exists in protein primary sequences that is easily obtainable and useful for the prediction of protein structural class by neural networks as well as by standard statistical clustering algorithms.