Premium
General models for resource use or other compositional count data using the Dirichlet‐multinomial distribution
Author(s) -
de Valpine Perry,
Harmon-Threatt Alexandra N.
Publication year - 2013
Publication title -
ecology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.144
H-Index - 294
eISSN - 1939-9170
pISSN - 0012-9658
DOI - 10.1890/12-0416.1
Subject(s) - count data , multinomial distribution , dirichlet distribution , statistics , econometrics , ecology , mathematics , computer science , biology , poisson distribution , mathematical analysis , boundary value problem
Many ecological studies investigate how organisms use resources, such as habitats or foods, in relation to availability or other variables. Related statistical problems include analysis of proportions of species or genotypes in a community or population. These require statistical modeling of compositional count data: data on relative proportions of each category collected as counts. Common methods for analyzing compositional count data lack one or more important considerations. Some methods lack explicit accommodation of count data, dealing instead with proportions. Others do not handle between‐sample heterogeneity for overdispersed data. Yet others do not allow general types of relationships between explanatory variables and resource use. All three components have been combined in a Bayesian framework, but for frequentist hypothesis tests and AIC model selection, maximum‐ likelihood estimation is needed. Here we propose the Dirichlet‐multinomial distribution to accommodate overdispersed compositional count data. This approach can be used flexibly in combination with explanatory models, but the only correlations among compositional proportions that it can accommodate are the negative correlations due to the fact that proportions must sum to 1. Many existing models can be generalized to use the Dirichlet‐multinomial distribution for residual variation, and the flexibility of the approach allows new hypotheses that have often not been considered in resource preference analysis, including that availability has no relation to use. We also highlight a new design for resource use studies, with multiple individual‐use data sets from each of multiple sites, with different explanatory data for each site. We illustrate the approach with three examples. For two previously published habitat use data sets, we support the original conclusions and show that use is not unrelated to availability. For a data set of pollen collected by multiple bees from each of two sites, pollen use differs between the sites. Using bootstrap goodness‐of‐fit tests, we illustrate that the Dirichlet‐multinomial is acceptable for two of the examples but unsuitable for one of the habitat use examples.