Premium
Accurate prediction for atomic‐level protein design and its application in diversifying the near‐optimal sequence space
Author(s) -
Fromer Menachem,
Yanover Chen
Publication year - 2008
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.22280
Subject(s) - protein design , sequence (biology) , protein structure prediction , computer science , sequence space , benchmark (surveying) , energy (signal processing) , algorithm , probabilistic logic , set (abstract data type) , alignment free sequence analysis , protein structure , sequence alignment , mathematics , peptide sequence , artificial intelligence , biology , gene , geography , statistics , genetics , geodesy , pure mathematics , banach space , programming language , biochemistry
The task of engineering a protein to assume a target three‐dimensional structure is known as protein design. Computational search algorithms are devised to predict a minimal energy amino acid sequence for a particular structure. In practice, however, an ensemble of low‐energy sequences is often sought. Primarily, this is performed because an individual predicted low‐energy sequence may not necessarily fold to the target structure because of both inaccuracies in modeling protein energetics and the nonoptimal nature of search algorithms employed. Additionally, some low‐energy sequences may be overly stable and thus lack the dynamic flexibility required for biological functionality. Furthermore, the investigation of low‐energy sequence ensembles will provide crucial insights into the pseudo‐physical energy force fields that have been derived to describe structural energetics for protein design. Significantly, numerous studies have predicted low‐energy sequences, which were subsequently synthesized and demonstrated to fold to desired structures. However, the characterization of the sequence space defined by such energy functions as compatible with a target structure has not been performed in full detail. This issue is critical for protein design scientists to successfully continue using these force fields at an ever‐increasing pace and scale. In this paper, we present a conceptually novel algorithm that rapidly predicts the set of lowest energy sequences for a given structure. Based on the theory of probabilistic graphical models, it performs efficient inspection and partitioning of the near‐optimal sequence space, without making any assumptions of positional independence. We benchmark its performance on a diverse set of relevant protein design examples and show that it consistently yields sequences of lower energy than those derived from state‐of‐the‐art techniques. Thus, we find that previously presented search techniques do not fully depict the low‐energy space as precisely. Examination of the predicted ensembles indicates that, for each structure, the amino acid identity at a majority of positions must be chosen extremely selectively so as to not incur significant energetic penalties. We investigate this high degree of similarity and demonstrate how more diverse near‐optimal sequences can be predicted in order to systematically overcome this bottleneck for computational design. Finally, we exploit this in‐depth analysis of a collection of the lowest energy sequences to suggest an explanation for previously observed experimental design results. The novel methodologies introduced here accurately portray the sequence space compatible with a protein structure and further supply a scheme to yield heterogeneous low‐energy sequences, thus providing a powerful instrument for future work on protein design. Proteins 2009. © 2008 Wiley‐Liss, Inc.