z-logo
Premium
Selecting pseudo‐absences for species distribution models: how, where and how many?
Author(s) -
BarbetMassin Morgane,
Jiguet Frédéric,
Albert Cécile Hélène,
Thuiller Wilfried
Publication year - 2012
Publication title -
methods in ecology and evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.425
H-Index - 105
ISSN - 2041-210X
DOI - 10.1111/j.2041-210x.2011.00172.x
Subject(s) - regression , environmental niche modelling , statistics , species distribution , regression analysis , distribution (mathematics) , prediction interval , predictive modelling , computer science , machine learning , mathematics , ecology , artificial intelligence , econometrics , biology , mathematical analysis , ecological niche , habitat
Summary 1.  Species distribution models are increasingly used to address questions in conservation biology, ecology and evolution. The most effective species distribution models require data on both species presence and the available environmental conditions (known as background or pseudo‐absence data) in the area. However, there is still no consensus on how and where to sample these pseudo‐absences and how many. 2.  In this study, we conducted a comprehensive comparative analysis based on simple simulated species distributions to propose guidelines on how, where and how many pseudo‐absences should be generated to build reliable species distribution models. Depending on the quantity and quality of the initial presence data (unbiased vs. climatically or spatially biased), we assessed the relative effect of the method for selecting pseudo‐absences (random vs. environmentally or spatially stratified) and their number on the predictive accuracy of seven common modelling techniques (regression, classification and machine‐learning techniques). 3.  When using regression techniques, the method used to select pseudo‐absences had the greatest impact on the model’s predictive accuracy. Randomly selected pseudo‐absences yielded the most reliable distribution models. Models fitted with a large number of pseudo‐absences but equally weighted to the presences (i.e. the weighted sum of presence equals the weighted sum of pseudo‐absence) produced the most accurate predicted distributions. For classification and machine‐learning techniques, the number of pseudo‐absences had the greatest impact on model accuracy, and averaging several runs with fewer pseudo‐absences than for regression techniques yielded the most predictive models. 4.  Overall, we recommend the use of a large number (e.g. 10 000) of pseudo‐absences with equal weighting for presences and absences when using regression techniques (e.g. generalised linear model and generalised additive model); averaging several runs (e.g. 10) with fewer pseudo‐absences (e.g. 100) with equal weighting for presences and absences with multiple adaptive regression splines and discriminant analyses; and using the same number of pseudo‐absences as available presences (averaging several runs if few pseudo‐absences) for classification techniques such as boosted regression trees, classification trees and random forest. In addition, we recommend the random selection of pseudo‐absences when using regression techniques and the random selection of geographically and environmentally stratified pseudo‐absences when using classification and machine‐learning techniques.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here