Premium
Universal Approach for Structural Interpretation of QSAR/QSPR Models
Author(s) -
Polishchuk Pavel G.,
Kuz'min Victor E.,
Artemenko Anatoly G.,
Muratov Eugene N.
Publication year - 2013
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201300029
Subject(s) - quantitative structure–activity relationship , random forest , computer science , artificial intelligence , interpretation (philosophy) , support vector machine , universality (dynamical systems) , applicability domain , molecular descriptor , cheminformatics , machine learning , data mining , simplex , fragment (logic) , mathematics , algorithm , chemistry , computational chemistry , physics , quantum mechanics , programming language , geometry
In this paper we offer a novel approach for the structural interpretation of QSAR models. The major advantage of our developed methodology is its universality, i.e., it can be applied to any QSAR/QSPR model irrespective of chemical descriptors and machine learning methods applied. This universality was achieved by using only the information obtained from substructures of the compounds of interest to interpret model outcomes. Reliability of the offered approach was confirmed by the results of three case studies, including end‐points of different types (continuous and binary classification) and nature (solubility, mutagenicity, and inhibition of Transglutaminase 2), various fragment and whole‐molecule descriptors (Simplex and Dragon), and multiple modeling techniques (partial least squares, random forest, and support vector machines). We compared the global contributions of molecular fragments obtained using our methodology with known SAR rules derived experimentally. In all cases high concordance between our interpretation and results published by others was observed. Although the proposed interpretation approach could be easily extended to any type of descriptors, we would recommend using Simplex descriptors to achieve a larger variety of investigated molecular fragments. The developed approach is a good tool for interpretation of such “black box” models like random forest, neural networks, etc. Analysis of fragment global contributions and their deviation across a dataset could be useful for the identification of key fragments and structural alerts. This information could be helpful to maximize the positive influence of structural surroundings on the given fragment and to decrease the negative effects.