Premium
New computational algorithm for the prediction of protein folding types
Author(s) -
Štambuk Nikola,
Konjevoda Paško
Publication year - 2001
Publication title -
international journal of quantum chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.484
H-Index - 105
eISSN - 1097-461X
pISSN - 0020-7608
DOI - 10.1002/qua.1302
Subject(s) - computer science , algorithm , protein structure prediction , classifier (uml) , protein sequencing , artificial intelligence , protein folding , pattern recognition (psychology) , protein structure , peptide sequence , biology , genetics , biochemistry , gene
We present a new computational algorithm for the prediction of a secondary protein structure. The method enables the evaluation of α‐ and β‐protein folding types from the nucleotide sequences. The procedure is based on the reflected Gray code algorithm of nucleotide–amino acid relationships, and represents the extension of Swanson's procedure in Ref. 4. It is shown that six‐digit binary notation of each codon enables the prediction of α‐ and β‐protein folds by means of the error‐correcting linear block triple‐check code. We tested the validity of the method on the test set of 140 proteins (70 α‐ and 70 β‐folds). The test set consisted of standard α‐ and β‐protein classes from Jpred and SCOP databases, with nucleotide sequence available in the GenBank database. 100% accurate classification of α‐ and β‐protein folds, based on 39 dipeptide addresses derived by the error‐correcting coding procedure was obtained by means of the logistic regression analysis ( p <0.1). Classification tree and machine learning sequential minimal optimization (SMO) classifier confirmed the results by means 97.1% and 90% accurate classification, respectively. Protein fold prediction quality tested by means of leave‐one‐out cross‐validation was a satisfactory 82.1% for the logistic regression and 81.4% for the SMO classifier. The presented procedure of computational analysis can be helpful in detecting the type of protein folding from the newly sequenced exon regions. The method enables quick, simple, and accurate prediction of α‐ and β‐protein folds from the nucleotide sequence on a personal computer. © 2001 John Wiley & Sons, Inc. Int J Quant Chem 84: 13–22, 2001