Premium
Combining multiple data sources in species distribution models while accounting for spatial dependence and overfitting with combined penalized likelihood maximization
Author(s) -
Renner Ian W.,
Louvrier Julie,
Gimenez Olivier
Publication year - 2019
Publication title -
methods in ecology and evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.425
H-Index - 105
ISSN - 2041-210X
DOI - 10.1111/2041-210x.13297
Subject(s) - overfitting , likelihood function , lasso (programming language) , expectation–maximization algorithm , computer science , restricted maximum likelihood , maximization , econometrics , poisson distribution , statistics , independence (probability theory) , maximum likelihood , mathematics , artificial intelligence , mathematical optimization , artificial neural network , world wide web
The increase in availability of species datasets means that approaches to species distribution modelling that incorporate multiple datasets are in greater demand. Recent methodological developments in this area have led to combined likelihood approaches, in which a log‐likelihood comprised of the sum of the log‐likelihood components of each data source is maximized. Often, these approaches make use of at least one presence‐only dataset and use the log‐likelihood of an inhomogeneous Poisson point process model in the combined likelihood construction. While these advancements have been shown to improve predictive performance, they do not currently address challenges in presence‐only modelling such as checking and correcting for violations of the independence assumption of a Poisson point process model or more general challenges in species distribution modelling such as overfitting. In this paper, we present an extension of the combined likelihood framework which accommodates alternative presence‐only likelihoods in the presence of spatial dependence as well as lasso‐type penalties to account for potential overfitting. We compare the proposed combined penalized likelihood approach to the standard combined likelihood approach via simulation and apply the method to modelling the distribution of the Eurasian lynx in the Jura Mountains in eastern France. The simulations show that the proposed combined penalized likelihood approach has better predictive performance than the standard approach when spatial dependence is present in the data. The lynx analysis shows that the predicted maps vary significantly between the model fitted with the proposed combined penalized approach accounting for spatial dependence and the model fitted with the standard combined likelihood. This work highlights the benefits of careful consideration of the presence‐only components of the combined likelihood formulation, and allows greater flexibility and ability to accommodate real datasets.