
Efficient EK-means: Extended K-means Clustering for Categorical data with High Processing Speed
Author(s) -
Xi Chen,
Shuo Peng
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1584/1/012074
Subject(s) - categorical variable , cluster analysis , computer science , data mining , scalability , range (aeronautics) , table (database) , limit (mathematics) , data processing , algorithm , artificial intelligence , machine learning , mathematics , database , engineering , mathematical analysis , aerospace engineering
The typical representative of the hard clustering algorithm, K-means, is one of the fastest processing algorithms with good scalability. However, it cannot deal with categorical attributes, which is one of the important indicators to measure the pros and cons. Due to the lack of processing capabilities on categorical attributes, k-means has a large limit on data processing capabilities. This paper proposes a clustering algorithm extends from K-means. This algorithm introduces the concept of a Pseudo-mean distance calculation formula and a counting-table so that categorical attributes can be processed while reducing the time cost as much as possible. Experimental results illustrate the proposed Pseudo-means can extend the processing range of k-means to category-type data, and the counting table also effectively reduces the time cost.