Premium
Applying additive modelling and gradient boosting to assess the effects of watershed and reach characteristics on riverine assemblages
Author(s) -
Maloney Kelly O.,
Schmid Matthias,
Weller Donald E.
Publication year - 2012
Publication title -
methods in ecology and evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.425
H-Index - 105
ISSN - 2041-210X
DOI - 10.1111/j.2041-210x.2011.00124.x
Subject(s) - generalized additive model , species richness , overfitting , generalized linear model , ecology , environmental science , gradient analysis , statistics , mathematics , biology , computer science , machine learning , ordination , artificial neural network
Summary 1. Issues with ecological data (e.g. non‐normality of errors, nonlinear relationships and autocorrelation of variables) and modelling (e.g. overfitting, variable selection and prediction) complicate regression analyses in ecology. Flexible models, such as generalized additive models (GAMs), can address data issues, and machine learning techniques (e.g. gradient boosting) can help resolve modelling issues. Gradient boosted GAMs do both. Here, we illustrate the advantages of this technique using data on benthic macroinvertebrates and fish from 1573 small streams in Maryland, USA. 2. We assembled a predictor matrix of 15 watershed attributes (e.g. ecoregion and land use), 15 stream attributes (e.g. width and habitat quality) and location (latitude and longitude). We built boosted and conventionally estimated GAMs for macroinvertebrate richness and for the relative abundances of macroinvertebrates in the Orders Ephemeroptera, Plecoptera and Trichoptera (%EPT); individuals that cling to substrate (%Clingers); and individuals in the collector/gatherer functional feeding group (%Collectors). For fish, models were constructed for taxonomic richness, benthic species richness, biomass and the relative abundance of tolerant individuals (%Tolerant Fish). 3. For several of the responses, boosted GAMs had lower pseudo R 2 s than conventional GAMs for in‐sample data but larger pseudo R 2 s for out‐of‐bootstrap data, suggesting boosted GAMs do not overfit the data and have higher prediction accuracy than conventional GAMs. The models explained most variation in fish richness (pseudo R 2 = 0·97), least variation in %Clingers (pseudo R 2 = 0·28) and intermediate amounts of variation in the other responses (pseudo R 2 s between 0·41 and 0·60). Many relationships of macroinvertebrate responses to anthropogenic measures and natural watershed attributes were nonlinear. Fish responses were related to system size and local habitat quality. 4. For impervious surface, models predicted below model‐average macroinvertebrate richness at levels above c. 3·0%, lower %EPT above c. 1·5%, and lower %Clingers for levels above c. 2·0%. Impervious surface did not affect %Collectors or any fish response. Prediction functions for %EPT and fish richness increased linearly with log 10 (watershed area), %Tolerant Fish decreased with log 10 (watershed area), and benthic fish richness and biomass both increased nonlinearly with log 10 (watershed area). 5. Gradient boosting optimizes the predictive accuracy of GAMs while preserving the structure of conventional GAMs, so that predictor–response relationships are more interpretable than with other machine learning methods. Boosting also avoids overfitting the data (by shrinking effect estimates towards zero and by performing variable selection), thus avoiding spurious predictor effects and interpretations. Thus, in many ecological settings, it may be reasonable to use boosting instead of conventional GAMs.