
Bias and information in biological records
Author(s) -
Isaac Nick J. B.,
Pocock Michael J. O.
Publication year - 2015
Publication title -
biological journal of the linnean society
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.906
H-Index - 112
eISSN - 1095-8312
pISSN - 0024-4066
DOI - 10.1111/bij.12532
Subject(s) - sampling (signal processing) , set (abstract data type) , sampling bias , variety (cybernetics) , data collection , range (aeronautics) , biology , data science , scale (ratio) , biodiversity , measure (data warehouse) , corollary , computer science , information retrieval , statistics , ecology , data mining , sample size determination , mathematics , artificial intelligence , cartography , geography , materials science , filter (signal processing) , pure mathematics , composite material , computer vision , programming language
Biological recording is in essence a very simple concept in which a record is the report of a species at a physical location at a certain time. The collation of these records into a dataset is a powerful approach to addressing large‐scale questions about biodiversity change. Records are collected by volunteers at times and places that suit them, leading to a variety of biases: uneven sampling over space and time, uneven sampling effort per visit and uneven detectability. These need to be controlled for in statistical analyses that use biological records. In particular, the data are ‘presence‐only’, and lack information on the sampling protocol or intensity. Submitting ‘complete lists’ of all the species seen is one potential solution because the data can be treated as ‘presence–absence’ and detectability of each species can be statistically modelled. The corollary of bias is that records vary in their ‘information content’. The information content is a measure of how much an individual record, or collection of records, contributes to reducing uncertainty in a parameter of interest. The information content of biological records varies, depending on the question to which the data are being applied. We consider a set of hypothetical ‘syndromes’ of recording behaviour, each of which is characterized by different information content. We demonstrate how these concepts can be used to support the growth of a particular type of recording behaviour. Approaches to recording are rapidly changing, especially with the growth of mass participation citizen science. We discuss how these developments present a range of challenges and opportunities for biological recording in the future. © 2015 The Linnean Society of London, Biological Journal of the Linnean Society , 2015, ●● , ●●–●●.