Trustworthiness, the Key to Grid-Based Map-Driven Predictive Model Enhancement and Applicability Domain Control | Zendy

Dragos Horvath | Zendy; Gilles Marcou | Zendy; Alexandre Varnek | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Trustworthiness, the Key to Grid-Based Map-Driven Predictive Model Enhancement and Applicability Domain Control

Author(s) -

Dragos Horvath,

Gilles Marcou,

Alexandre Varnek

Publication year - 2020

Publication title -

journal of chemical information and modeling

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.24

H-Index - 160

eISSN - 1549-960X

pISSN - 1549-9596

DOI - 10.1021/acs.jcim.0c00998

Subject(s) - applicability domain , computer science , grid , data mining , set (abstract data type) , domain (mathematical analysis) , node (physics) , trustworthiness , artificial intelligence , function (biology) , covariant transformation , machine learning , quantitative structure–activity relationship , mathematics , mathematical analysis , geometry , computer security , structural engineering , evolutionary biology , engineering , biology , programming language

In chemography, grid-based maps sample molecular descriptor space by injecting a set of nodes, and then linking them to some regular 2D grid representing the map. They include self-organizing maps (SOMs) and generative topographic maps (GTMs). Grid-based maps are predictive because any compound thereupon projected can "inherit" the properties of its residence node(s)-node properties themselves "inherited" from node-neighboring training set compounds. This Article proposes a formalism to define the trustworthiness of these nodes as "providers" of structure-activity information captured from training compounds. An empirical four-parameter node trustworthiness (NT) function of density (sparsely populated nodes are less trustworthy) and coherence (nodes with training set residents of divergent properties are less trustworthy) is proposed. Based upon it, a trustworthiness score T is used to delimit the applicability domain (AD) by means of a trustworthiness threshold TT. For each parameter setup, success of ensuing inside-AD predictions is monitored. It is seen that setup-specific success levels (averaged over large pools of prediction challenges) are highly covariant, irrespectively of the targets of prediction challenges, of the (classification or regression) type of problems, of the specific parametrization, and even of the nature (GTM or SOM) of underlying maps. Thus, success levels determined on the basis of regression problems (445 target-specific affinity QSAR sets) on GTMs and levels returned by completely unrelated classification problems (319 target-specific active-/inactive-labeled sets) on SOMs were seen to correlate to a degree of 70%. Therefore, a common, general-purpose setup of the herein proposed parametric AD definition was shown to generally apply to grid-based map-driven property prediction problems.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research