Premium
Structural bias in aggregated species‐level variables driven by repeated species co‐occurrences: a pervasive problem in community and assemblage data
Author(s) -
Hawkins Bradford A.,
Leroy Boris,
Rodríguez Miguel Á.,
Singer Alexander,
Vilela Bruno,
Villalobos Fabricio,
Wang Xiangping,
Zelený David
Publication year - 2017
Publication title -
journal of biogeography
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.7
H-Index - 158
eISSN - 1365-2699
pISSN - 0305-0270
DOI - 10.1111/jbi.12953
Subject(s) - species richness , generalized additive model , statistics , range (aeronautics) , regression , regression analysis , ecology , variance (accounting) , multivariate statistics , breeding bird survey , spatial analysis , random forest , spatial ecology , econometrics , mathematics , geography , computer science , biology , habitat , artificial intelligence , materials science , accounting , business , composite material
Abstract Aim Species attributes are often used to explain diversity patterns across assemblages/communities. However, repeated species co‐occurrences can generate spatial pattern and strong statistical relationships between aggregated attributes and richness in the absence of biological information. Our aim is to increase awareness of this problem. Location North America. Methods We generated empirical species richness patterns using two data structures: (1) birds gridded from range maps and (2) tree communities from the US Forest Service's Forest Inventory and Analysis. We analysed richness using linear regression, regression trees, generalized additive models, geographically weighted regression and simultaneous autoregression, with ‘random intrinsic variables’ as predictors generated by assigning random numbers to species and calculating averages in assemblages. We then generated simulations in which species with cohesive or patchy distributions are placed with respect to the North American temperature gradient with or without a broad‐scale richness gradient. Random intrinsic variables are again used as predictors of richness. Finally, we analysed one simulated scenario with random intrinsic variables as both response and predictor variables. Results The models of bird and tree richness often explained moderate to large proportions of the variance. Regression trees, geographically weighted regression and simultaneous autoregression were very sensitive to the problem; generalized additive models were moderately affected, as was multiple regression to a lesser extent. In the virtual data, the variance explained increased with increasing species co‐occurrences, but neither range cohesion, a richness gradient nor spatial autocorrelation in predictors had major impacts on the variance explained. The problem persisted when the response variable was also a random intrinsic variable. Main conclusions Repeated species co‐occurrences can generate strong spurious relationships between richness and aggregated species attributes. It is important to realize that models utilizing assemblage variables aggregated from species‐level values, as well as maps illustrating their spatial patterns, cannot be taken at face value.