Premium
Mining of protein contact maps for protein fold prediction
Author(s) -
Bhavani S Durga,
K Suvarnavani,
Sinha Somdatta
Publication year - 2011
Publication title -
wiley interdisciplinary reviews: data mining and knowledge discovery
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.506
H-Index - 47
eISSN - 1942-4795
pISSN - 1942-4787
DOI - 10.1002/widm.35
Subject(s) - fold (higher order function) , computer science , protein data bank , artificial intelligence , simple (philosophy) , protein structure , subdivision , decision tree , protein structure prediction , machine learning , data mining , pattern recognition (psychology) , biology , geography , biochemistry , programming language , philosophy , epistemology , archaeology
The three‐dimensional structure of proteins is useful to carry out the biophysical and biochemical functions in a cell. Approaches to protein structure/fold prediction typically extract amino acid sequence features, and machine learning approaches are then applied to classification problem. Protein contact maps are two‐dimensional representations of the contacts among the amino acid residues in the folded protein structure. This paper highlights the need for a systematic study of these contact networks. Mining of contact maps to derive features pertaining to fold information offers a new mechanism for fold discovery from the protein sequence via the contact maps. These ideas are explored in the structural class of all‐alpha proteins to identify structural elements. A simple and computationally inexpensive algorithm based on triangle subdivision method is proposed to extract additional features from the contact map. The method successfully characterizes the off‐diagonal interactions in the contact map for predicting specific ‘folds’. The decision tree classification results show great promise in developing a new and simple tool for the challenging problem of fold prediction. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 362–368 DOI: 10.1002/widm.35 This article is categorized under: Algorithmic Development > Biological Data Mining Technologies > Classification Technologies > Machine Learning