Premium
Influence of protein structure databases on the predictive power of statistical pair potentials
Author(s) -
Furuichi Emiko,
Koehl Patrice
Publication year - 1998
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/(sici)1097-0134(19980501)31:2<139::aid-prot4>3.0.co;2-h
Subject(s) - statistical potential , protein structure database , database , protein structure prediction , computer science , protein folding , cutoff , protein structure , folding (dsp implementation) , chemistry , physics , quantum mechanics , biochemistry , sequence database , gene , electrical engineering , engineering
A long standing goal in protein structure studies is the development of reliable energy functions that can be used both to verify protein models derived from experimental constraints as well as for theoretical protein folding and inverse folding computer experiments. In that respect, knowledge‐based statistical pair potentials have attracted considerable interests recently mainly because they include the essential features of protein structures as well as solvent effects at a low computing cost. However, the basis on which statistical potentials are derived have been questioned. In this paper, we investigate statistical pair potentials derived from protein three‐dimensional structures, addressing in particular questions related to the form of these potentials, as well as to the content of the database from which they are derived. We have shown that statistical pair potentials depend on the size of the proteins included in the database, and that this dependence can be reduced by considering only pairs of residue close in space (i.e., with a cutoff of 8 Å). We have shown also that statistical potentials carry a memory of the quality of the database in terms of the amount and diversity of secondary structure it contains. We find, for example, that potentials derived from a database containing α‐proteins will only perform best on α‐proteins in fold recognition computer experiments. We believe that this is an overall weakness of these potentials, which must be kept in mind when constructing a database. Proteins 31:139–149, 1998. © 1998 Wiley‐Liss, Inc.