Open Access
Discovering biogeographic and ecological clusters with a graph theoretic spin on factor analysis
Author(s) -
Alroy John
Publication year - 2019
Publication title -
ecography
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.973
H-Index - 128
eISSN - 1600-0587
pISSN - 0906-7590
DOI - 10.1111/ecog.04464
Subject(s) - set (abstract data type) , computer science , graph , outlier , raw data , cluster (spacecraft) , binary number , multidimensional scaling , metric (unit) , ecology , mathematics , statistics , theoretical computer science , biology , artificial intelligence , operations management , arithmetic , economics , programming language
Factor analysis (FA) has the advantage of highlighting each semi‐distinct cluster of samples in a data set with one axis at a time, as opposed to simply arranging samples across axes to represent gradients. However, in the case of presence–absence data it is confounded by absences when gradients are long. No statistical model can cope with this problem because the raw data simply do not present underlying information about the length of such gradients. Here I propose an easy way to tease out this information. It is a simple emendation of FA called stepping down, which involves giving an absence a negative value when the missing species nowhere co‐occurs with the species found in the relevant sample. Specifically, a binary co‐occurrence graph is created, and the magnitude of negative values is made a function of how far the graph must be traversed in order to link the missing species with each species that is present. Simulations show that standard FA yields inferior results to FA based on stepped‐down matrices in terms of mapping clusters into axes one‐by‐one. Standard FA is also uninformative when applied to a global bat inventory data set. Step‐down FA (SDFA) easily flags the main biogeographic groupings. Methods like correspondence analysis, non‐metric multidimensional scaling, and Bayesian latent variable modelling are not commensurate with SDFA because they do not seek to find a one‐to‐one mapping of axes and clusters. Stepping down seems promising as a means of illustrating clusters of samples, especially when there are subtle or complex discontinuities in gradients.