Associations among similarity and distance measures for binary data in cluster analysis | Zendy

Jana Cibulková | Zendy; Zdeněk Šulc | Zendy; Hana Řezanková | Zendy; Sergej Sirota | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Associations among similarity and distance measures for binary data in cluster analysis

Author(s) -

Jana Cibulková,

Zdeněk Šulc,

Hana Řezanková,

Sergej Sirota

Publication year - 2020

Publication title -

advances in methodology and statistics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.127

H-Index - 7

eISSN - 1854-0031

pISSN - 1854-0023

DOI - 10.51936/yelx5179

Subject(s) - similarity (geometry) , cluster analysis , hierarchical clustering , computer science , binary number , data mining , binary data , cluster (spacecraft) , complete linkage clustering , object (grammar) , consensus clustering , process (computing) , fuzzy clustering , artificial intelligence , mathematics , cure data clustering algorithm , image (mathematics) , arithmetic , programming language , operating system

The paper focuses on similarity and distance measures for binary data and their application in cluster analysis. There are 66 measures for binary data analyzed in the paper in order to provide a comprehensive insight into the problematics and to create their well-arranged overview. For this purpose, formulas by which they were defined are studied. In the next part of the research, the results of object clustering on generated datasets are compared, and the ability of measures to create similar or identical clustering solutions is evaluated. This is done by using chosen internal and external evaluation criteria, and comparing the assignments of objects into clusters in the process of hierarchical clustering. The paper shows which similarity measures and distance measures for binary data lead to similar or even identical results in hierarchical cluster analysis.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research