Premium
Cocktail clustering – a new hierarchical agglomerative algorithm for extracting species groups in vegetation databases
Author(s) -
Bruelheide Helge
Publication year - 2016
Publication title -
journal of vegetation science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.1
H-Index - 115
eISSN - 1654-1103
pISSN - 1100-9233
DOI - 10.1111/jvs.12454
Subject(s) - vegetation (pathology) , hierarchical clustering , vegetation classification , cluster analysis , hierarchy , geography , set (abstract data type) , group (periodic table) , global biodiversity , mathematics , ecology , database , computer science , statistics , biology , biodiversity , chemistry , medicine , organic chemistry , pathology , economics , market economy , programming language
Aims In one approach of formalized vegetation classification, species groups define a vegetation unit as a set of relevés, each of which possesses a minimum number of species from that group. Thus, species groups provide unequivocal rules for the assignment of individual vegetation records to vegetation units and can be applied beyond the set of records used for defining them. Here, I present a new method that subjects all species in a vegetation database to clustering that produces such clear membership rules of vegetation records to clusters. More specifically, the algorithm obtains species groups from a species × relevé matrix which consist of species that show the highest probability of co‐occurring with each other and delivers unequivocal rules to assign relevés to these groups. Methods A hierarchical agglomerative clustering algorithm for species is presented that starts with a species × species matrix of the Ф coefficient of association. After fusing the species with the highest Ф coefficient, the Ф association matrix is recalculated for the new group of species. For calculating Ф association for groups to other species or to the nodes formed by groups of species, the observed frequency distribution of co‐occurrences of the species in that group is compared to the expected frequency distribution of co‐occurrence, derived the from the observed number of species occurrences. As a result, for each species group a minimum number of species is obtained that is required to assign a relevé to this species group. The resulting Cocktail species groups are partially nested, and with increasing node hierarchy show a tendency of decreasing Ф correlation to the last‐joining species to that group. Results and Conclusion As the clustering algorithm assigns all of the n species in a data set to groups, the result are n − 1 partly nested species groups. These groups correspond to species groups that have been extracted from the same data sets using preconceived start groups. Subsequently, the species groups can be used separately or in logical combinations to classify vegetation relevés either by expert systems, Twinspan‐like classification algorithms or by redefining existing vegetation units with automated brute‐force match algorithms. Used in this way, Cocktail clustering is able to form the backbone of a consistent large‐scale vegetation classification system.