Premium
A Combinational Strategy of Model Disturbance and Outlier Comparison to Define Applicability Domain in Quantitative Structural Activity Relationship
Author(s) -
Yan Jun,
Zhu WeiWei,
Kong Bo,
Lu HongBing,
Yun YongHuan,
Huang JianHua,
Liang YiZeng
Publication year - 2014
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201300161
Subject(s) - outlier , applicability domain , leverage (statistics) , computer science , data mining , test set , quantitative structure–activity relationship , set (abstract data type) , artificial intelligence , pattern recognition (psychology) , machine learning , programming language
In order to define an applicability domain for quantitative structure‐activity relationship modeling, a combinational strategy of model disturbance and outlier comparison is developed. An indicator named model disturbance index was defined to estimate the prediction error. Moreover, the information of the outliers in the training set was used to filter the unreliable samples in the test set based on “structural similarity”. Chromatography retention indices data were used to investigate this approach. The relationship between model disturbance index and prediction error can be found. Also, the comparison between the outlier set and the test set could provide additional information about which unknown samples should be paid more attentions. A novel technique based on model population analysis was used to evaluate the validity of applicability domain. Finally, three commonly used methods, i.e. Leverage, descriptor range‐based and model perturbation method, were compared with the proposed approach.