z-logo
Premium
Model complexity and information in the data: Could it be a house built on sand?
Author(s) -
Lele Subhash R.
Publication year - 2010
Publication title -
ecology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.144
H-Index - 294
eISSN - 1939-9170
pISSN - 0012-9658
DOI - 10.1890/10-0099.1
Subject(s) - citation , library science , computer science , information retrieval , mathematics , world wide web
Heisey et al. (2010), in an interesting paper, try to address a very difficult problem of analyzing spatially referenced, age specific prevalence data. The general goal of the analysis is to understand how force of infection changes as a function of age, time, and space. To further complicate matters, all the data considered in the paper are censored observations. Binary data are notoriously difficult to analyze, especially when latent processes are involved and prevalence is very low. Frankly, I was surprised by the complexity of the models they consider and the limited amount of information available to fit these models. I would like to congratulate them for trying to address such a difficult problem and in the process bringing to the attention of the ecologists some important statistical models in survival analysis. How does one generally deal with the conflicting issues of lack of information and desire to conduct inference about complex underlying processes? The standard approach is to compensate for lack of information by adding assumptions. This is done routinely in most statistical analyses by assuming a parametric model. For example, one can conduct inference in ANOVA without assuming any specific relationship between the treatment means if replicate observations are available at each treatment level. If such replicate data are not available, instead of giving up, we assume that there is a linear (or, some parametric) relationship between the covariates and the response, the regression approach. This is a smoothing assumption. Similarly, in one of the fundamental papers on statistical inference in the presence of nuisance parameters, Kiefer and Wolfowitz (1956) showed that simply assuming that the nuisance parameters arise from a distribution is enough of a smoothing assumption to estimate not only the parameters of interest but also the distribution function from which nuisance parameters are assumed to have arisen. Heisey et al. (2010) try to get away with the limited information available in the prevalence data, where all observations are censored, by imposing constraints on the log-hazard, a smoothing assumption of another sort. This is the easy part. The real questions are: (1) Given the limited amount of information in the data, what assumptions do we need until some inference is feasible? and (2) Are these inferences primarily driven by the data or by the assumptions? Technically, the answer to the first question is straightforward: add assumptions until the parameters in the model, at least the ones that are of scientific interest, are estimable given the data. The second question is qualitative. It is partially addressed by studying the sensitivity of the inferences on the parameters of interest (assuming they are identifiable) to the violations of the assumptions. I will discuss these issues in the remainder of the commentary. I assume readers are familiar with the basic descriptions in Heisey et al. (2010). Perhaps the easiest way out of the limited information in the prevalence data is to assume a specific parametric model for the log-hazard function. This does not guarantee that the parameters will be identifiable but it has the best chance. Heisey et al. (2010) do not take this easy way out. They aspire to assume less about the form of the log-hazard function. As they point out, the ‘‘nonparametric’’ MLE of the log-hazard is very choppy and unstable. It is generally not consistent, at least not at the usual ffiffiffi

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here