Premium
Prediction of the heat capacity for compounds based on the conjugate gradient and support vector machine methods
Author(s) -
Shi Jingjie,
Chen Liping,
Chen Wanghua
Publication year - 2013
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2532
Subject(s) - applicability domain , support vector machine , ant colony optimization algorithms , molecular descriptor , conjugate gradient method , correlation coefficient , cross validation , artificial intelligence , computer science , plot (graphics) , mathematics , biological system , algorithm , pattern recognition (psychology) , quantitative structure–activity relationship , machine learning , statistics , biology
A quantitative structure–property relationship model for prediction of the heat capacity was developed from molecular structures. By using DRAGON 2.1, various kinds of molecular structure descriptors were calculated to represent the molecular structures of compounds, which contain 18 categories of descriptors in total. The novel variable selection method of ant colony optimization (ACO) algorithm was employed to select an optimal subset of descriptors that have significant contribution to the property from a large pool of calculated descriptors. As a result, five descriptors were screened out as input parameters. With the same five descriptors, ACO coupled with the conjugate gradient (CG) method and support vector machine (SVM) method was employed to construct the linear model (ACO‐CG) and the nonlinear model (ACO‐SVM), respectively. The results showed robust models and small prediction error, and the built models were very satisfying. In addition, the fitting and predicting performances of the ACO‐SVM model (squared correlation coefficient,R train 2 = 0.9607 ,R test 2 = 0.9398 ) are both better than that of the ACO‐CG model ( R train 2 = 0.9404 ,R test 2 = 0.9281 ). The traditional validation parameters of Q loo 2 (internal validation) and Q ext 2 (external validation) have been supplemented with two novel parameters r m 2 and c R p 2for a stricter test of validation. The developed models could achieve the required values for the novel parameters r m 2 (r m 2 ¯ > 0.5 , Δ r m 2 < 0.2 ) and c R p 2( c R p 2 > 0.5 ). From the preceding analysis, it can be concluded that the proposed methods can be successfully used to predict the heat capacity with preselected theoretical descriptors, which can be directly calculated solely from the molecular structure. The applicability domain of the model was assessed by the Williams plot. Copyright © 2013 John Wiley & Sons, Ltd.