Premium
Presence‐Only Data and the EM Algorithm
Author(s) -
Ward Gill,
Hastie Trevor,
Barry Simon,
Elith Jane,
Leathwick John R.
Publication year - 2009
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/j.1541-0420.2008.01116.x
Subject(s) - statistics , logistic regression , population , deviance (statistics) , categorical variable , expectation–maximization algorithm , computer science , logistic function , sampling (signal processing) , constraint (computer aided design) , algorithm , mathematics , maximum likelihood , demography , geometry , filter (signal processing) , sociology , computer vision
Summary In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence‐only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation–maximization algorithm to estimate the underlying presence–absence logistic model for presence‐only data. This algorithm can be used with any off‐the‐shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence–absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided.