z-logo
Premium
Selecting discriminant function models for predicting the expected richness of aquatic macroinvertebrates
Author(s) -
SICKLE JOHN VAN,
HUFF DAVID D.,
HAWKINS CHARLES P.
Publication year - 2006
Publication title -
freshwater biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.297
H-Index - 156
eISSN - 1365-2427
pISSN - 0046-5070
DOI - 10.1111/j.1365-2427.2005.01487.x
Subject(s) - overfitting , statistics , species richness , discriminant function analysis , linear discriminant analysis , predictive modelling , mean squared error , function (biology) , ecology , mathematics , computer science , machine learning , biology , artificial neural network , evolutionary biology
Summary 1. The predictive modelling approach to bioassessment estimates the macroinvertebrate assemblage expected at a stream site if it were in a minimally disturbed reference condition. The difference between expected and observed assemblages then measures the departure of the site from reference condition. 2. Most predictive models employ site classification, followed by discriminant function (DF) modelling, to predict the expected assemblage from a suite of environmental variables. Stepwise DF analysis is normally used to choose a single subset of DF predictor variables with a high accuracy for classifying sites. An alternative is to screen all possible combinations of predictor variables, in order to identify several ‘best’ subsets that yield good overall performance of the predictive model. 3. We applied best‐subsets DF analysis to assemblage and environmental data from 199 reference sites in Oregon, U.S.A. Two sets of 66 best DF models containing between one and 14 predictor variables (that is, having model orders from one to 14) were developed, for five‐group and 11‐group site classifications. 4. Resubstitution classification accuracy of the DF models increased consistently with model order, but cross‐validated classification accuracy did not improve beyond seventh or eighth‐order models, suggesting that the larger models were overfitted. 5. Overall predictive model performance at model training sites, measured by the root‐mean‐squared error of the observed/expected species richness ratio, also improved steadily with DF model order. But high‐order DF models usually performed poorly at an independent set of validation sites, another sign of model overfitting. 6. Models selected by stepwise DF analysis showed evidence of overfitting and were outperformed by several of the best‐subsets models. 7. The group separation strength of a DF model, as measured by Wilks’Λ, was more strongly correlated with overall predictive model performance at training sites than was DF classification accuracy. 8. Our results suggest improved strategies for developing reliable, parsimonious predictive models. We emphasise the value of independent validation data for obtaining a realistic picture of model performance. We also recommend assessing not just one or two, but several, candidate models based on their overall performance as well as the performance of their DF component. 9. We provide links to our free software for stepwise and best‐subsets DF analysis.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here