Premium
An Advanced Group Contribution Method for High‐Dimensional, Sparse Data Sets
Author(s) -
Lee Chang Jun,
Lee Jong Min
Publication year - 2012
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201100111
Subject(s) - group (periodic table) , computer science , data mining , chemistry , organic chemistry
Today’s chemical processes involve many components, and it is necessary to know their basic physical properties for process design and operation. However, it is not always possible to find the property information of all components in the literature. Generally, there are two ways to evaluate properties of chemical compounds when they do not exist in the literature: the experimental measurement and predictive approaches based on empirical models. The latter is called the group contribution method (GCM), and its basic concept is that specific functional groups or fragments of a molecule contribute to the value of its physical property. The advantage of the GCMs is that they reduce the effort and cost compared to experiments. This study proposes a novel GCM method suitable for high‐dimensional, sparse data sets. In order to improve its applicability and accuracy, the database is extended and divided into non‐ring group compounds and ring group ones. Support vector regression (SVR) is adopted as the regression model, and a derivative‐free optimization approach, referred to as particle swarm optimization, is incorporated into the parameter optimization step in learning the SVM model to avoid local optimality. Performance of the proposed model is compared to those of other GCMs.