Premium
A simple clustering algorithm can be accurate enough for use in calculations of p K s in macromolecules
Author(s) -
Myers Jonathan,
Grothaus Greg,
Narayanan Shivaram,
Onufriev Alexey
Publication year - 2006
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.20922
Subject(s) - macromolecule , monte carlo method , partition (number theory) , partition function (quantum field theory) , computation , statistical physics , cluster analysis , representation (politics) , simple (philosophy) , function (biology) , algorithm , computational chemistry , chemistry , computer science , mathematics , physics , quantum mechanics , combinatorics , artificial intelligence , statistics , biochemistry , philosophy , epistemology , evolutionary biology , politics , political science , law , biology
Structure and function of macromolecules depend critically on the ionization states of their acidic and basic groups. Most current structure‐based theoretical methods that predict p K of ionizable groups in macromolecules include, as one of the key steps, a computation of the partition sum (Boltzmann average) over all possible protonation microstates. As the number of these microstates depends exponentially on the number of ionizable groups present in the molecule, direct computation of the sum is not realistically feasible for many typical proteins that may have tens or even hundreds of ionizable groups. We have tested a simple and robust approximate algorithm for computing these partition sums for macromolecules. The method subdivides the interacting sites into independent clusters, based upon the strength of site–site electrostatic interaction. The resulting partition function is factorizable into computationally manageable components. Two variants of the approach are presented and validated on a representative test set of 602 proteins, by comparing the p K 1/2 values computed by the proposed method with those obtained by the standard Monte Carlo approach used as a reference. With 95% confidence, the relative error introduced by the more accurate of the two methods is less than 0.25 pK units. The algorithms are one to two orders of magnitude faster than the Monte Carlo method, with the typical settings. A graphical representation is introduced that visualizes the clusters of strong site–site interactions in the context of the three‐dimensional (3D) structure of the macromolecule, facilitating identification of functionally important clusters of ionizable groups; the approach is exemplified on two proteins, bacteriorhodopsin and myoglobin . Proteins 2006. © 2006 Wiley‐Liss, Inc.