Chemoinformatic Classification Methods and their Applicability Domain | Zendy

Mathea Miriam | Zendy; Klingspohn Waldemar | Zendy; Baumann Knut | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Chemoinformatic Classification Methods and their Applicability Domain

Author(s) -

Mathea Miriam,

Klingspohn Waldemar,

Baumann Knut

Publication year - 2016

Publication title -

molecular informatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.481

H-Index - 68

eISSN - 1868-1751

pISSN - 1868-1743

DOI - 10.1002/minf.201501019

Subject(s) - cheminformatics , applicability domain , computer science , classifier (uml) , artificial intelligence , domain (mathematical analysis) , data mining , outlier , machine learning , categorical variable , pattern recognition (psychology) , quantitative structure–activity relationship , mathematics , bioinformatics , mathematical analysis , biology

Classification rules are often used in chemoinformatics to predict categorical properties of drug candidates related to bioactivity from explanatory variables, which encode the respective molecular structures (i.e. molecular descriptors). To avoid predictions with an unduly large error probability, the domain the classifier is applied to should be restricted to the domain covered by the training set objects. This latter domain is commonly referred to as applicability domain in chemoinformatics. Conceptually, the applicability domain defines the region in space where the “normal” objects are located. Defining the border of the applicability domain may then be viewed as detecting anomalous or novel objects or as detecting outliers. Currently two different types of measures are in use. The first one defines the applicability domain solely in terms of the molecular descriptor space, which is referred to as novelty detection. The second type defines the applicability domain in terms of the expected reliability of the predictions which is referred to as confidence estimation. Both types are systematically differentiated here and the most popular measures are reviewed. It will be shown that all common chemoinformatic classifiers have built‐in confidence scores. Since confidence estimation uses information of the class labels for computing the confidence scores, it is expected to be more efficient in reducing the error rate than novelty detection, which solely uses the information of the explanatory variables.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research