The effect of sample size on the accuracy of species distribution models: considering both presences and pseudo‐absences or background sites | Zendy

Liu Canran | Zendy; Newell Graeme | Zendy; White Matt | Zendy

Open Access

The effect of sample size on the accuracy of species distribution models: considering both presences and pseudo‐absences or background sites

Author(s) -

Liu Canran,

Newell Graeme,

White Matt

Publication year - 2019

Publication title -

ecography

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.973

H-Index - 128

eISSN - 1600-0587

pISSN - 0906-7590

DOI - 10.1111/ecog.03188

Subject(s) - statistics , sample size determination , mathematics , set (abstract data type) , sample (material) , weight distribution , computer science , engineering , physics , programming language , aerospace engineering , thermodynamics

Most high‐performing species distribution modelling techniques require both presences, and either absences or pseudo‐absences or background points. In this paper, we explore the effect of sample size, towards developing improved strategies for modelling. We generated 1800 virtual species with three levels of prevalence using ten modelling techniques, while varying the number of training presences (NTP) and the number of random points (NRP representing pseudo‐absences or background sites). For five of the ten modelling techniques we built two versions of models: one with an equal total weight (ETW) setting where the total weight for pseudo‐absence is equivalent to the total weight for presence, and another with an unequal total weight (UTW) setting where the total weight for pseudo‐absence is not required to be equal to the total weight for presence. We compared two strategies for NRP: a small multiplier strategy (i.e. setting NRP at a few times as large as NTP), and a large number strategy (i.e. using numerous random points). We produced ensemble models (by averaging the predictions from 30 models built with the same set of training presences and different sets of random points in equivalent numbers) for three NTP magnitudes and two NRP strategies. We found that model accuracy altered as NRP increased with four distinct patterns of performance: increasing, decreasing, arch‐shaped and horizontal. In most cases ETW improved model performance. Ensemble models had higher accuracy than the corresponding single models, and this improvement was pronounced when NTP was low. We conclude that a large NRP is not always an appropriate strategy. The best choice for NRP will depend on the modelling techniques used, species prevalence and NTP. We recommend building ensemble models instead of single models, using the small multiplier strategy for NRP with ETW, especially when only a small number of species presence records are available.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research