Premium
Supervised cluster classification using the original n ‐dimensional space without transformation into lower dimension
Author(s) -
AlAmmar Assad S.,
Barnes Ramon M.
Publication year - 2001
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/1099-128x(200101)15:1<49::aid-cem631>3.0.co;2-2
Subject(s) - cluster analysis , dimensionality reduction , principal component analysis , computer science , curse of dimensionality , artificial intelligence , pattern recognition (psychology) , dimension (graph theory) , data set , artificial neural network , feature vector , k nearest neighbors algorithm , cluster (spacecraft) , clustering high dimensional data , transformation (genetics) , set (abstract data type) , data mining , hierarchical clustering , mathematics , biochemistry , chemistry , pure mathematics , gene , programming language
A novel supervised classification algorithm, direct clustering in n ‐dimensional space (DCNS), was developed for difficult data sets where conventional methods of supervised clustering are expected to fail. The method is based, when applied on >3‐dimensional spaces, on an algorithm that performs special treatment on the measurement space, so that the treated space can allow a computer‐aided clustering methodology similar to that used by human vision. However, unlike other techniques that reduce the dimensionality of the space, the proposed method preserves the original dimensions while performing a computer‐simulated human vision clustering in the original n ‐dimensional space. Thus the overlap between clusters that results from the dimensionality reduction is eliminated. The proposed method was applied to two real data sets. The results are compared with those obtained using principal component analysis (PCA), an artificial neural network (ANN), and the k ‐nearest‐neighbor (KNN) technique. On one data set containing only two clusters, the DCNS algorithm gives better cluster separation than the other three methods. However, when all four methods were applied on the second data set, containing eight different clusters, PCA, ANN and KNN were unable to give useful cluster separation, while the DCNS method was able to separate all clusters and classify the unknown points successfully with their corresponding clusters. The DCNS technique is able to perform other important cluster analysis tasks, such as testing the discriminatory power of a variable, selecting one variable from many, and conducting preliminary unsupervised clustering. Copyright © 2000 John Wiley & Sons, Ltd.