Premium
Impacts of imperfect reference data on the apparent accuracy of species presence–absence models and their predictions
Author(s) -
Foody Giles M.
Publication year - 2011
Publication title -
global ecology and biogeography
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.164
H-Index - 152
eISSN - 1466-8238
pISSN - 1466-822X
DOI - 10.1111/j.1466-8238.2010.00605.x
Subject(s) - imperfect , statistics , reference data , econometrics , independence (probability theory) , data quality , standard error , sample size determination , computer science , dependency (uml) , sample (material) , ecology , mathematics , data mining , artificial intelligence , biology , metric (unit) , philosophy , linguistics , operations management , chemistry , chromatography , economics
Aim To explore the impacts of imperfect reference data on the accuracy of species distribution model predictions. The main focus is on impacts of the quality of reference data (labelling accuracy) and, to a lesser degree, data quantity (sample size) on species presence–absence modelling. Innovation The paper challenges the common assumption that some popular measures of model accuracy and model predictions are prevalence independent. It highlights how imperfect reference data may impact on a study and the actions that may be taken to address problems. Main conclusions The theoretical independence of prevalence of popular accuracy measures, such as sensitivity, specificity, true skills statistics (TSS) and area under the receiver operating characteristic curve (AUC), is unlikely to occur in practice due to reference data error; all of these measures of accuracy, together with estimates of species occurrence, showed prevalence dependency arising through the use of a non‐gold‐standard reference. The number of cases used also had implications for the ability of a study to meet its objectives. Means to reduce the negative effects of imperfect reference data in study design and interpretation are suggested.