z-logo
Premium
Unsupervised Classification of Chemical Compounds
Author(s) -
Guttiérrez Toscano P.,
Marriott F. H. C.
Publication year - 1999
Publication title -
journal of the royal statistical society: series c (applied statistics)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.205
H-Index - 72
eISSN - 1467-9876
pISSN - 0035-9254
DOI - 10.1111/1467-9876.00146
Subject(s) - multidimensional scaling , cluster analysis , computer science , fingerprint (computing) , data mining , pattern recognition (psychology) , scaling , binary data , binary number , data set , metric (unit) , cluster (spacecraft) , metric space , set (abstract data type) , coding (social sciences) , artificial intelligence , mathematics , machine learning , statistics , discrete mathematics , engineering , operations management , geometry , arithmetic , programming language
Clustering chemical compounds of similar structure is important in the pharmaceutical industry. One way of describing the structure is the chemical `fingerprint'. The fingerprint is a string of binary digits, and typical data sets consist of very large numbers of fingerprints; a suitable clustering procedure must take account of the properties of this method of coding, and must be able to handle large data sets. This paper describes the analysis of a set of fingerprint data. The analysis was based on an appropriate distance measure derived from the fingerprints, followed by metric scaling into a low‐dimensional space. An approximation to metric scaling, suitable for very large data sets, was investigated. Cluster analysis using two programs, mclust and AutoClass‐C, was carried out on the scaled data.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here