
K-modes Algorithm Based on Rough Set and Information Entropy
Author(s) -
Gong Xingyu,
Ke Cao,
Ping Jia,
Shimin Gong
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1754/1/012239
Subject(s) - cluster analysis , rough set , data mining , entropy (arrow of time) , computer science , algorithm , canopy clustering algorithm , correlation clustering , artificial intelligence , mathematics , pattern recognition (psychology) , physics , quantum mechanics
The traditional K-modes algorithm is susceptible to interference of redundant attributes, and only adopts the 0-1 matching method to define the distance between attribute values of each two objects, without fully considering the influence of each classify attribute on clustering result. In order to overcome these shortcomings, this paper proposes improved K-modes clustering algorithm based on rough set and information entropy. Aiming at a large number of redundant attributes in the clustering data, this paper firstly utilizes attribute reduction algorithm of rough set to eliminate redundant attributes and determine the importance of each attribute, then combines information gain to determine the weight of each attribute and finally makes performance tests of the traditional algorithm and the improved algorithm on five data sets of UCI machine learning library, such as Soybean-Small and Zoo. The experimental results show that the clustering efficiency and accuracy of improved algorithm is higher than that of traditional algorithm, and the performance of improved algorithm is better.