z-logo
open-access-imgOpen Access
Big Data and Large Sample Size: A Cautionary Note on the Potential for Bias
Author(s) -
Kaplan Robert M.,
Chambers David A.,
Glasgow Russell E.
Publication year - 2014
Publication title -
clinical and translational science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.303
H-Index - 44
eISSN - 1752-8062
pISSN - 1752-8054
DOI - 10.1111/cts.12178
Subject(s) - big data , sample size determination , sampling (signal processing) , statistics , sample (material) , sampling bias , computer science , econometrics , observational error , sampling error , data science , data mining , mathematics , telecommunications , chemistry , chromatography , detector
Abstract A number of commentaries have suggested that large studies are more reliable than smaller studies and there is a growing interest in the analysis of “big data” that integrates information from many thousands of persons and/or different data sources. We consider a variety of biases that are likely in the era of big data, including sampling error, measurement error, multiple comparisons errors, aggregation error, and errors associated with the systematic exclusion of information. Using examples from epidemiology, health services research, studies on determinants of health, and clinical trials, we conclude that it is necessary to exercise greater caution to be sure that big sample size does not lead to big inferential errors. Despite the advantages of big studies, large sample size can magnify the bias associated with error resulting from sampling or study design. Clin Trans Sci 2014; Volume #: 1–5

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here