Premium
Prediction of retention indices for frequently reported compounds of plant essential oils using multiple linear regression, partial least squares, and support vector machine
Author(s) -
Yan Jun,
Huang JianHua,
He Min,
Lu HongBing,
Yang Rui,
Kong Bo,
Xu QingSong,
Liang YiZeng
Publication year - 2013
Publication title -
journal of separation science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.72
H-Index - 102
eISSN - 1615-9314
pISSN - 1615-9306
DOI - 10.1002/jssc.201300254
Subject(s) - partial least squares regression , multivariate statistics , support vector machine , linear regression , mathematics , statistics , bayesian multivariate linear regression , correlation coefficient , partial correlation , feature selection , least squares support vector machine , regression analysis , selection (genetic algorithm) , biological system , correlation , artificial intelligence , computer science , biology , geometry
Retention indices for frequently reported compounds of plant essential oils on three different stationary phases were investigated. Multivariate linear regression, partial least squares, and support vector machine combined with a new variable selection approach called random‐frog recently proposed by our group, were employed to model quantitative structure–retention relationships. Internal and external validations were performed to ensure the stability and predictive ability. All the three methods could obtain an acceptable model, and the optimal results by support vector machine based on a small number of informative descriptors with the square of correlation coefficient for cross validation, values of 0.9726, 0.9759, and 0.9331 on the dimethylsilicone stationary phase, the dimethylsilicone phase with 5% phenyl groups, and the PEG stationary phase, respectively. The performances of two variable selection approaches, random‐frog and genetic algorithm, are compared. The importance of the variables was found to be consistent when estimated from correlation coefficients in multivariate linear regression equations and selection probability in model spaces.