Premium
Bayesian model aggregation for ensemble‐based estimates of protein pK a values
Author(s) -
Gosink Luke J.,
Hogan Emilie A.,
Pulsipher Trenton C.,
Baker Nathan A.
Publication year - 2014
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.24390
Subject(s) - computer science , generalizability theory , ensemble forecasting , bayesian probability , ranging , aggregate (composite) , ensemble learning , machine learning , protein structure prediction , artificial intelligence , data mining , protein structure , chemistry , mathematics , statistics , telecommunications , biochemistry , materials science , composite material
This article investigates an ensemble‐based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid p K a predictions. Structure‐based p K a calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p K a prediction, ranging from empirical statistical models to ab initio quantum mechanical approaches. However, each of these methods are based on a set of conceptual assumptions that can effect a model's accuracy and generalizability for p K a prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate p K a values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the p K a Cooperative and the p K a measurements are based on experimental work conducted by the García‐Moreno lab. Our cross‐validation study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods with improvements ranging from 45 to 73% over other method classes. This study also compares BMA's predictive performance to other ensemble‐based techniques and demonstrates that BMA can outperform these approaches with improvements ranging from 27 to 60%. This work illustrates a new possible mechanism for improving the accuracy of p K a prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy. Proteins 2014; 82:354–363. © 2013 Wiley Periodicals, Inc.