z-logo
Premium
QSPR study for prediction of boiling points of 2475 organic compounds using stochastic gradient boosting
Author(s) -
Zhang Juehong,
Liu Zaiming,
Liu Wanrong
Publication year - 2014
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2587
Subject(s) - quantitative structure–activity relationship , boosting (machine learning) , partial least squares regression , test set , boiling point , gradient boosting , mean squared error , cross validation , regression , random forest , linear regression , mathematics , regression analysis , statistics , computer science , chemistry , artificial intelligence , machine learning , organic chemistry
The normal boiling point is one of the major physicochemical properties used to characterize and identify an organic compound. In this study, the boosting regression tree model was developed to model quantitative structure–property relationship (QSPR) for the boiling points of 2475 compounds with structurally high heterogeneity. Stochastic gradient boosting (SGB) aims at constructing additive regression models by sequentially fitting a simple regression tree model to current “pseudo”‐residuals by least squares at each iteration. The parameters of SGB were optimized using 10‐fold cross‐validation. The best SGB model established using 2D descriptors had an overall Q 2 of 0.957, root mean square error of validation of 17.89 for validation set, and RT 2 of 0.954, root mean square error of test of 18.19 for test set. Compared to other commonly used modeling methods such as partial least square, classification and regression tree, and random forest, SGB can not only obtain the best predictive ability, but also get more useful insights into the relationship between properties and descriptors for prediction of boiling points, with the help of partial dependence plots. SGB could be a promising tool in the field of QSPR research, especially for the screening of new compounds. Copyright © 2014 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here