Premium
Information‐theoretic analysis of the reference state in contact potentials used for protein structure prediction
Author(s) -
Solis Armando D.,
Rackovsky Shalom R.
Publication year - 2009
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.22652
Subject(s) - threading (protein sequence) , divergence (linguistics) , protein structure prediction , computer science , decoy , set (abstract data type) , sequence (biology) , algorithm , function (biology) , protein structure , mathematics , statistical physics , biological system , biology , physics , genetics , linguistics , philosophy , receptor , programming language , biochemistry
Using information‐theoretic concepts, we examine the role of the reference state, a crucial component of empirical potential functions, in protein fold recognition. We derive an information‐based connection between the probability distribution functions of the reference state and those that characterize the decoy set used in threading. In examining commonly used contact reference states, we find that the quasi‐chemical approximation is informatically superior to other variant models designed to include characteristics of real protein chains, such as finite length and variable amino acid composition from protein to protein. We observe that in these variant models, the total divergence, the operative function that quantifies discrimination, decreases along with threading performance. We find that any amount of nativeness encoded in the reference state model does not significantly improve threading performance. A promising avenue for the development of better potentials is suggested by our information‐theoretic analysis of the action of contact potentials on individual protein sequences. Our results show that contact potentials perform better when the compositional properties of the data set used to derive the score function probabilities are similar to the properties of the sequence of interest. Results also suggest to use only sequences of similar composition in deriving contact potentials, to tailor the contact potential specifically for a test sequence. Proteins 2010. © 2009 Wiley‐Liss, Inc.