z-logo
Premium
Identifying and reducing error in cluster‐expansion approximations of protein energies
Author(s) -
Hahn Seungsoo,
Ashenberg Orr,
Grigoryan Gevorg,
Keating Amy E.
Publication year - 2010
Publication title -
journal of computational chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.907
H-Index - 188
eISSN - 1096-987X
pISSN - 0192-8651
DOI - 10.1002/jcc.21585
Subject(s) - sequence (biology) , algorithm , computer science , sequence space , protein structure prediction , cluster expansion , cluster (spacecraft) , pdz domain , set (abstract data type) , biological system , protein structure , mathematics , physics , chemistry , discrete mathematics , quantum mechanics , nuclear magnetic resonance , biochemistry , banach space , programming language , biology
Protein design involves searching a vast space for sequences that are compatible with a defined structure. This can pose significant computational challenges. Cluster expansion is a technique that can accelerate the evaluation of protein energies by generating a simple functional relationship between sequence and energy. The method consists of several steps. First, for a given protein structure, a training set of sequences with known energies is generated. Next, this training set is used to expand energy as a function of clusters consisting of single residues, residue pairs, and higher order terms, if required. The accuracy of the sequence‐based expansion is monitored and improved using cross‐validation testing and iterative inclusion of additional clusters. As a trade‐off for evaluation speed, the cluster‐expansion approximation causes prediction errors, which can be reduced by including more training sequences, including higher order terms in the expansion, and/or reducing the sequence space described by thecluster expansion. This article analyzes the sources of error and introduces a method whereby accuracy can be improved by judiciously reducing the described sequence space. The method is applied to describe the sequence–stability relationship for several protein structures: coiled‐coil dimers and trimers, a PDZ domain, and T4 lysozyme as examples with computationally derived energies, and SH3 domains in amphiphysin‐1 and endophilin‐1 as examples where the expanded pseudo‐energies are obtained from experiments. Our open‐source software package Cluster Expansion Version 1.0 allows users to expand their own energy function of interest and thereby apply cluster expansion to custom problems in protein design. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here