Premium
A Study of Genetic Algorithm Evolution on the Lipophilicity of Polychlorinated Biphenyls
Author(s) -
Jäntschi Lorentz,
Bolboacă Sorana D.,
Sestraş Radu E.
Publication year - 2010
Publication title -
chemistry and biodiversity
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.427
H-Index - 70
eISSN - 1612-1880
pISSN - 1612-1872
DOI - 10.1002/cbdv.200900356
Subject(s) - partition coefficient , quantitative structure–activity relationship , mathematics , correlation coefficient , poisson distribution , statistics , gaussian , genetic algorithm , linear regression , lipophilicity , sample size determination , dimension (graph theory) , algorithm , mathematical optimization , chemistry , combinatorics , chromatography , computational chemistry , organic chemistry , stereochemistry
The search for multivariate linear regression (MLR) in quantitative structure–property relationships (QSPR) is a hard problem, due to the dimension of the entire search space. A genetic algorithm (GA) was developed and assessed, to select proper descriptors for predicting the octan‐1‐ol/H 2 O partition coefficient of polychlorinated biphenyls. The GA was implemented as a Windows based FreePascal application with MySQL connectivity for fetching the data. An outcome study based on 30 runs was done keeping all parameters constant: sample size, 8; number of variables in the MLR, 2; adaptation‐imposed requirements; maximum number of generations, 1000; selection strategy, proportional; probability of mutation, 0.05; number of genes implied in mutation, 2; optimization parameter, r 2 ; optimization score, minimum in sample; and optimization objective, maximum. The results revealed that the number of evolutions followed the Poisson distribution with the sample size as parameter. The average of the determination coefficient is higher than 98% of the determination coefficient obtained through complete search, and follows the Gaussian distribution. The correlation coefficients obtained by the best performing GA‐MLR models proved not to be statistically different from the correlation coefficient of the QSPR model obtained by complete search.