Premium
Biases in lake water quality sampling and implications for macroscale research
Author(s) -
Stanley Emily H.,
Collins Sarah M.,
Lottig Noah R.,
Oliver Samantha K.,
Webster Katherine E.,
Cheruvelil Kendra S.,
Soranno Patricia A.
Publication year - 2019
Publication title -
limnology and oceanography
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.7
H-Index - 197
eISSN - 1939-5590
pISSN - 0024-3590
DOI - 10.1002/lno.11136
Subject(s) - eutrophication , environmental science , sampling (signal processing) , water quality , trophic level , chlorophyll a , term (time) , resampling , statistics , physical geography , hydrology (agriculture) , nutrient , ecology , geography , biology , mathematics , computer science , botany , physics , geotechnical engineering , filter (signal processing) , quantum mechanics , engineering , computer vision
Growth of macroscale limnological research has been accompanied by an increase in secondary datasets compiled from multiple sources. We examined patterns of data availability in LAGOS‐NE, a dataset derived from 87 sources, to identify biases in availability of lake water quality data and to consider how such biases might affect perceived patterns at a subcontinental scale. Of eight common water quality parameters, variables indicative of trophic state (Secchi, chlorophyll, and total P) were most abundant in terms of total observations, lakes sampled, and long‐term records, whereas carbon variables (true color and dissolved organic carbon) were scarcest. Most data were collected during summer from larger (≥ 20 ha) lakes over 1–3 yr. Approximately 80% of data for each variable is derived from ~ 20% of sampled lakes. Long‐term (≥ 20 yr) records were rare and spatially clustered. Data availability is linked to major management challenges (eutrophication and acid rain), citizen science, and a few programs that quantify C and N variables. Resampling exercises suggested that correcting for the surface area sampling bias did not substantially change statistical distributions of the eight variables. Further, estimating a lake's long‐term median Secchi, chlorophyll, and total P using average record lengths had high uncertainty, but modest increases in sample size to > 5 yr yielded estimates with manageable error. Although the specific nature of sampling biases may vary among regions, we expect that they are widespread. Thus, large integrated datasets can and should be used to identify tendencies in how lakes are studied and to address these biases as part broad‐scale limnological investigations.