Premium
COEFFICIENTS OF ASSOCIATION AND SIMILARITY, BASED ON BINARY (PRESENCE‐ABSENCE) DATA: AN EVALUATION
Author(s) -
HUBÁLEK ZDENEK
Publication year - 1982
Publication title -
biological reviews
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 4.993
H-Index - 165
eISSN - 1469-185X
pISSN - 1464-7931
DOI - 10.1111/j.1469-185x.1982.tb00376.x
Subject(s) - jaccard index , similarity (geometry) , interspecific competition , mathematics , statistics , combinatorics , biology , botany , artificial intelligence , cluster analysis , computer science , image (mathematics)
Summary Forty‐three association (similarity) coefficients were collected and evaluated in this survey. Some of them are synonyms or direct correlates with earlier described indices (A 8 , A 9 , A 12 , A 31 , A 33 ), others are mere transforms from one range of values to another (A 10 , A 24 , A 33 ). Several coefficients are incompatible with suggested admissibility conditions of the minimum‐maximum value (A 13 , A 16 , A 27 , A 28 , A 29 , A 31 ), symmetry (A 1 , A 2 , A 13 , A 16 , A 26 ), discrimination between positive and negative association (A 27 , A 28 , A 31 ) or monotonicity with (χ 2 ) (A 19 , to A 24 ); A 17 yields very low and erratic values. As a result, 23 coefficients were excluded and the remaining 20 measures were subjected to an empirical trial on interspecific association data among fungi of the genus Chaetomium , with the use of a cluster analysis. The classification produced five main clusters of related coefficients, with several subgroups. It was then demonstrated that representative indices from different clusters yield different dendrograms of interspecific association among Chaetomium , and A 34 , A 14 , possibly also A 36 and A 40 seemed to be less sensible. A set of measures that generally work well (at least in the interspecific association) comprises A 4 (Jaccard), A 4 (Dice‐Sφrensen), A 7 (Kulczyński), A 11 (Driver‐Kroeber‐Ochiai) and, with some reservation A 30 (Pearson tetrachoric) and A 32 (Baroni‐Urbani‐Buser). For some purposes, however, other ‘admissible’ coefficients would be more optimal, and the choice of a measure should be related to the nature of the data. It is tentatively suggested that three or so alternative coefficients be used and the results compared on the same data basis; moreover, significance tests on association should be carried out whenever possible.