Premium
A novel tree kernel partial least squares for modeling the structure–activity relationship
Author(s) -
Huang Xin,
Cao DongSheng,
Xu QingSong,
Liang YiZeng
Publication year - 2013
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2490
Subject(s) - kernel (algebra) , pattern recognition (psychology) , artificial intelligence , categorical variable , partial least squares regression , kernel principal component analysis , kernel method , computer science , tree (set theory) , decision tree , mathematics , basis (linear algebra) , ranking (information retrieval) , feature (linguistics) , data mining , machine learning , support vector machine , mathematical analysis , linguistics , philosophy , geometry , combinatorics
Kernel partial least squares (KPLS) has become a popular technique for regression and classification of complex data sets, which is a nonlinear extension of linear PLS in which training samples are transformed into a feature space via a nonlinear mapping. The PLS algorithm can then be carried out in the feature space. In the present study, we attempt to develop a novel tree KPLS (TKPLS) classification algorithm by constructing an informative kernel on the basis of decision tree ensembles. The constructed tree kernel can effectively discover the similarities of samples and select informative features by variable importance ranking in the process of building the kernel. Simultaneously, TKPLS can also handle nonlinear relationships in the structure–activity relationship data by such a kernel. Finally, three data sets related to different categorical bioactivities of compounds are used to evaluate the performance of TKPLS. The results show that the TKPLS algorithm can be regarded as an alternative and promising classification technique. Copyright © 2013 John Wiley & Sons, Ltd.