Premium
Pattern recognition in the prediction of protein structure. I. Tripeptide conformational probabilities calculated from the amino acid sequence
Author(s) -
Lambert Millard H.,
Scheraga Harold A.
Publication year - 1989
Publication title -
journal of computational chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.907
H-Index - 188
eISSN - 1096-987X
pISSN - 0192-8651
DOI - 10.1002/jcc.540100603
Subject(s) - tripeptide , gaussian , simple (philosophy) , sequence (biology) , property (philosophy) , space (punctuation) , series (stratigraphy) , protein structure , algorithm , artificial intelligence , computer science , chemistry , computational chemistry , amino acid , biology , biochemistry , philosophy , epistemology , operating system , paleontology
A procedure that uses pattern recognition techniques to compute tripeptide conformational probabilities is described. The procedure differs in several respects from the many “secondary structure” prediction algorithms that have been published over the last 20 years. First, the procedure classifies tripeptides into 64 different conformational types, rather than just α, β and coil, as is commonly done. Thus, the procedure can attempt to predict regions of irregular structure. Second, the procedure uses the methods of pattern recognition, which are powerful but conceptually simple. In this approach, amino acid properties are used to map peptide sequences into a multivariate property space. Particular tripeptide conformations tend to map to particular regions of the property space. These regions are represented by multivariate gaussian distributions, where the parameters of the distributions are determined from tripeptides in the protein X‐ray data bank. Finally, rather than making simple predictions, the procedure computes probabilities. Tripeptide conformational probabilities are calculated in the multivariate property space using the gaussian distributions. In a prediction, the procedure might find that a particular tripeptide in a protein has a 36% chance of being in the ααα conformation, a 17% chance of being ααϵ, a 14% chance of being ααα*, etc. The α‐helical conformation is thus the most probable, but, in predicting the structure of the protein, a search algorithm should also consider some of the other possibilities. The values of the probability provide a rational basis for selecting from among the possible conformations. The second article of this series describes a procedure that uses the probabilities to direct a search through the conformational space of a protein. The third article of the series describes a procedure that generates actual three‐dimensional structures, and minimizes their energies. The three articles together describe a complete procedure, termed “pattern recognition‐based importance‐sampling minimization” (PRISM), for predicting protein structure from amino acid sequence.