z-logo
open-access-imgOpen Access
On the selection of thresholds for predicting species occurrence with presence‐only data
Author(s) -
Liu Canran,
Newell Graeme,
White Matt
Publication year - 2016
Publication title -
ecology and evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.17
H-Index - 63
ISSN - 2045-7758
DOI - 10.1002/ece3.1878
Subject(s) - statistics , random forest , selection (genetic algorithm) , contrast (vision) , mathematics , statistic , model selection , random effects model , species distribution , computer science , ecology , biology , artificial intelligence , medicine , meta analysis , habitat
Presence‐only data present challenges for selecting thresholds to transform species distribution modeling results into binary outputs. In this article, we compare two recently published threshold selection methods (max SSS and max F pb ) and examine the effectiveness of the threshold‐based prevalence estimation approach. Six virtual species with varying prevalence were simulated within a real landscape in southeastern Australia. Presence‐only models were built with DOMAIN , generalized linear model, Maxent, and Random Forest. Thresholds were selected with two methods max SSS and max F pb with four presence‐only datasets with different ratios of the number of known presences to the number of random points ( KP – RP ratio ). Sensitivity, specificity, true skill statistic, and F measure were used to evaluate the performance of the results. Species prevalence was estimated as the ratio of the number of predicted presences to the total number of points in the evaluation dataset. Thresholds selected with max F pb varied as the KP – RP ratio of the threshold selection datasets changed. Datasets with the KP – RP ratio around 1 generally produced better results than scores distant from 1. Results produced by We conclude that maxF pb had specificity too low for very common species using Random Forest and Maxent models. In contrast, max SSS produced consistent results whichever dataset was used. The estimation of prevalence was almost always biased, and the bias was very large for DOMAIN and Random Forest predictions. We conclude that max F pb is affected by the KP – RP ratio of the threshold selection datasets, but max SSS is almost unaffected by this ratio. Unbiased estimations of prevalence are difficult to be determined using the threshold‐based approach.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here