Premium
A clustering algorithm based on the weighted entropy of conditional attributes for mixed data
Author(s) -
Zhou Jing,
Chen Ke,
Liu Jinsheng
Publication year - 2021
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.6293
Subject(s) - cluster analysis , categorical variable , entropy (arrow of time) , adaptability , data mining , conditional entropy , computer science , mathematics , algorithm , pattern recognition (psychology) , artificial intelligence , machine learning , principle of maximum entropy , ecology , physics , quantum mechanics , biology
Summary A novel definition for weighted entropy is proposed to improve clustering performance for small and diverse datasets. First, intra‐class and inter‐class weighted entropies for categorical and numeric conditional attributes are respectively developed using the mathematical definition of entropy. Second, the weighted entropy is used to calculate cluster weights for mixed conditional attributes. A unique weighted clustering algorithm that adopts entropy as its primary description term, after integrating the corresponding distance calculation mechanism, is then introduced. Finally, a theoretical analysis and validation experiment were conducted using the UC‐Irvine dataset. Results showed that the proposed algorithm offers high self‐adaptability, as its clustering performance was superior to the existing K‐ prototypes, SBAC, and OCIL algorithms.