Premium
An evaluation of experimental design in QSAR modelling utilizing the k ‐medoid clustering
Author(s) -
Brandmaier Stefan,
Tetko Igor V.,
Öberg Tomas
Publication year - 2012
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2459
Subject(s) - medoid , cheminformatics , quantitative structure–activity relationship , computer science , cluster analysis , curse of dimensionality , selection (genetic algorithm) , set (abstract data type) , chemometrics , data mining , machine learning , reliability (semiconductor) , artificial intelligence , mathematics , chemistry , computational chemistry , power (physics) , physics , quantum mechanics , programming language
A reliable selection of a representative subset of chemical compounds has been reported to be crucial for numerous tasks in computational chemistry and chemoinformatics. We investigated the usability of an approach on the basis of the k ‐medoid algorithm for this task and in particular for experimental design and the split between training and validation set. We therefore compared the performance of models derived from such a selection to that of models derived using several other approaches, such as space‐filling design and D‐optimal design. We validated the performance on four datasets with different endpoints, representing toxicity, physicochemical properties and others. Compared with the models derived from the compounds selected by the other examined approaches, those derived with the k ‐medoid selection show a high reliability for experimental design, as their performance was constantly among the best for all examined datasets. Of all the models derived with all examined approaches, those derived with the k ‐medoid approach were the only ones that showed a significantly improved performance compared with a random selection, for all datasets, the whole examined range of selected compounds and for each dimensionality of the search space. Copyright © 2012 John Wiley & Sons, Ltd.