
SIMULTANEOUS TOPOLOGICAL CATEGORICAL DATA CLUSTERING AND CLUSTER CHARACTERIZATION
Author(s) -
Lazhar Labiod,
Nistor Grozavu,
Younès Bennani
Publication year - 2011
Publication title -
computing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.184
H-Index - 11
eISSN - 2312-5381
pISSN - 1727-6209
DOI - 10.47839/ijc.10.1.732
Subject(s) - cluster analysis , categorical variable , computer science , data mining , visualization , hierarchical clustering , pattern recognition (psychology) , artificial intelligence , machine learning
In this paper we propose a new automatic learning model which allows the simultaneously topological clustering and feature selection for quantitative datasets. We explore a new topological organization algorithm for categorical data clustering and visualization named RTC (Relational Topological Clustering). Generally, it is more difficult to perform clustering on categorical data than on numerical data due to the absence of the ordered property in the data. The proposed approach is based on the self-organization principle of the Kohonen’s model and uses the Relational Analysis formalism by optimizing a cost function defined as a modified Condorcet criterion. We propose an iterative algorithm, which deals linearly with large datasets, provides a natural clusters identification and allows a visualization of the clustering result on a two dimensional grid. Thereafter, the statistical ScreeTest is used to detect relevant and correlated features (or modalities) for each prototype. This test allows to detect the most important variables in an automatic way without setting any parameters. The proposed approach was validated on variant real datasets and the experimental results show the effectiveness of the proposed procedure.