Bayesian Active Learning for Optimization and Uncertainty Quantification in Protein Docking | Zendy

Yue Cao | Zendy; Yang Shen | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Bayesian Active Learning for Optimization and Uncertainty Quantification in Protein Docking

Author(s) -

Yue Cao,

Yang Shen

Publication year - 2020

Publication title -

journal of chemical theory and computation

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.001

H-Index - 185

eISSN - 1549-9626

pISSN - 1549-9618

DOI - 10.1021/acs.jctc.0c00476

Subject(s) - bayesian optimization , docking (animal) , computer science , bayesian probability , particle swarm optimization , algorithm , artificial intelligence , mathematical optimization , mathematics , medicine , nursing

Ab initio protein docking represents a major challenge for optimizing a noisy and costly "black box"-like function in a high-dimensional space. Despite progress in this field, there is a lack of rigorous uncertainty quantification (UQ). To fill the gap, we introduce a novel algorithm, Bayesian active learning (BAL), for optimization and UQ of such black-box functions with applications to flexible protein docking. BAL directly models the posterior distribution of the global optimum ( i.e. , native structures) with active sampling and posterior estimation iteratively feeding each other. Furthermore, it uses complex normal modes to span a homogeneous, Euclidean conformation space suitable for high-dimensional optimization and constructs funnel-like energy models for quality estimation of encounter complexes. Over a protein-docking benchmark set and a CAPRI set including homology docking, we establish that BAL significantly improves against starting points from rigid docking and refinements by particle swarm optimization, providing a top-3 near-native prediction for one third targets. Quality assessment empowered with UQ leads to tight quality intervals with half range around 25% of the actual interface root-mean-square deviation and confidence level at 85%. BAL's estimated probability of a prediction being near-native achieves binary classification AUROC at 0.93 and area under the precision recall curve over 0.60 (compared to 0.50 and 0.14, respectively, by chance), which also improves ranking predictions. This study represents the first UQ solution for protein docking, with rigorous theoretical frameworks and comprehensive empirical assessments.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research