Premium
Quantitative structure–property relationships of retention indices of some sulfur organic compounds using random forest technique as a variable selection and modeling method
Author(s) -
Goudarzi Nasser,
Shahsavani Davood,
EmadiGandaghi Fereshteh,
Chamjangali Mansour Arab
Publication year - 2016
Publication title -
journal of separation science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.72
H-Index - 102
eISSN - 1615-9314
pISSN - 1615-9306
DOI - 10.1002/jssc.201600358
Subject(s) - random forest , artificial neural network , feature selection , stepwise regression , kovats retention index , linear regression , quantitative structure–activity relationship , mathematics , regression analysis , regression , biological system , retention time , statistics , computer science , artificial intelligence , chemistry , machine learning , chromatography , gas chromatography , biology
In this work, a noble quantitative structure–property relationship technique is proposed on the basis of the random forest for prediction of the retention indices of some sulfur organic compounds. In order to calculate the retention indices of these compounds, the theoretical descriptors produced using their molecular structures are employed. The influence of the significant parameters affecting the capability of the developed random forest prediction power such as the number of randomly selected variables applied to split each node ( m ) and the number of trees ( n t ) is studied to obtain the best model. After optimizing the n t and m parameters, the random forest model conducted for m = 70 and n t = 460 was found to yield the best results. The artificial neural network and multiple linear regression modeling techniques are also used to predict the retention index values for these compounds for comparison with the results of random forest model. The descriptors selected by the stepwise regression and random forest model are used to build the artificial neural network models. The results achieved showed the superiority of the random forest model over the other models for prediction of the retention indices of the studied compounds.