Premium
β‐Hairpin prediction with quadratic discriminant analysis using diversity measure
Author(s) -
Zou Dongsheng,
He Zhongshi,
He Jingyuan
Publication year - 2009
Publication title -
journal of computational chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.907
H-Index - 188
eISSN - 1096-987X
pISSN - 0192-8651
DOI - 10.1002/jcc.21229
Subject(s) - discriminant , pattern recognition (psychology) , mathematics , linear discriminant analysis , feature (linguistics) , artificial intelligence , quadratic classifier , computer science , quadratic equation , correlation , beta diversity , statistics , biology , philosophy , linguistics , geometry , support vector machine , biodiversity , ecology
On the basis of the features of protein sequential pattern, we used the method of increment of diversity combined with quadratic discriminant analysis (IDQD) to predict β‐hairpins motifs in protein sequences. Three rules are used to extract the raw β‐β motifs sequential patterns for fixed‐length. Amino acid basic compositions, dipeptide components, and amino acid composition distribution are combined to represent the compositional features. Eighteen feature variables on a sequential pattern to be predicted are defined in terms of ID. They are integrated in a single formal framework given by IDQD. The method is trained and tested on ArchDB40 dataset containing 3088 proteins. The overall accuracy of prediction and Matthew's correlation coefficient for the independent testing dataset are 81.7% and 0.60, respectively. In addition, a higher accuracy of 84.5% and Matthew's correlation coefficient of 0.68 for the independent testing dataset are obtained on a dataset previously used by Kumar et al. (Nucleic Acids Res 2005, 33, 154), which contains 2088 proteins. For a fair assessment of our method, the performance is also evaluated on all 63 proteins used in CASP6. The overall accuracy of prediction is 74.2% for the independent testing dataset. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009