z-logo
Premium
Comparison between physicochemical and calculated molecular descriptors †
Author(s) -
Andersson Per M.,
Sjöström Michael,
Wold Svante,
Lundstedt Torbjörn
Publication year - 2000
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/1099-128x(200009/12)14:5/6<629::aid-cem606>3.0.co;2-m
Subject(s) - molecular descriptor , principal component analysis , set (abstract data type) , partial least squares regression , latent variable , quantitative structure–activity relationship , data set , mathematics , data mining , pattern recognition (psychology) , biological system , computer science , chemistry , artificial intelligence , statistics , machine learning , biology , programming language
It has earlier been proven that measured physicochemical properties are useful in the selection of building blocks for combinatorial chemistry as well as for investigation of the scope and limitations of organic reactions. However, measured physicochemical properties are only available for small subsets of reagents, starting materials or building blocks; therefore it is necessary to use calculated descriptors and it is essential that the descriptors are relevant. The objective was to investigate whether three different descriptor data sets contained similar information about the chemical structure, with the major aim to investigate whether calculated descriptors contain similar information as experimental data. A total of 205 heterogeneous primary amines were characterized using three different data sets of molecular descriptor variables. The first set consisted of four physicochemical variables compiled from the literature and commercially available chemicals in chemical catalogues. From these four descriptors together with molecular weight, three additional descriptors could be calculated, resulting in a total of eight descriptor variables in the first data set. The second data set consisted of 81 calculated molecular descriptor variables relating to size, connectivity, atom count, topology and electrotopology indices. The third data set consisted of 10 semi‐empirical variables (AM1). All the calculated variables were generated using the software Tsar 3.11. The descriptor variable sets were compared using principal component analysis (PCA) and partial least squares projections to latent structures (PLS). The following result shows that the different descriptor sets do contain similar latent information and that the different types of calculated variables do correlate well with the experimental data, making them suitable to use for e.g. combinatorial library design. Copyright © 2000 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here