z-logo
open-access-imgOpen Access
Confidence in Inactive and Active Predictions from Structural Alerts
Author(s) -
Andrew J. Wedlake,
Timothy E. H. Allen,
Jonathan M. Goodman,
Steve Gutsell,
Predrag Kukić,
Paul Russell
Publication year - 2020
Publication title -
chemical research in toxicology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.031
H-Index - 156
eISSN - 1520-5010
pISSN - 0893-228X
DOI - 10.1021/acs.chemrestox.0c00332
Subject(s) - set (abstract data type) , computer science , identification (biology) , similarity (geometry) , data mining , confidence interval , test set , low confidence , data set , cutoff , quantitative structure–activity relationship , machine learning , measure (data warehouse) , artificial intelligence , statistics , mathematics , biology , psychology , social psychology , botany , physics , quantum mechanics , image (mathematics) , programming language
Having a measure of confidence in computational predictions of biological activity from in silico ools is vital when making predictions for new chemicals, for example, in chemical risk assessment. Where predictions of biological activity are used as an indicator of a potential hazard, false-negative predictions are the most concerning prediction; however, assigning confidence in inactive predictions is particularly challenging. How can one confidently identify the absence of activating features? In this study, we present methods for assigning confidence to both active and inactive predictions from structural alerts for protein-binding molecular initiating events (MIEs). Structural alerts were derived through an iterative statistical method. Confidence in the activity predictions is assigned by measuring the Tanimoto similarity between Morgan fingerprints of chemicals in the test set to relevant chemicals in the training set, and suitable cutoff values have been defined to give different confidence categories. To avoid a potential compound series bias in the test set and hence overestimate the performance of the method, we measured the biological activity of 27 compounds with 24 proteins, which gave us an additional 648 experimental measurements; many of the measurements are currently nonexistent in the literature and databases. This data set was complemented with newly measured biological activities published in ChEMBL25 and formed a combined independent validation data set. Applying the confidence categories to the computational predictions for the new data leads to the identification of chemicals for which one should be confident of either an inactive or active prediction, allowing model predictions to be used responsibly.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom