
Bias Associated with Mining Electronic Health Records
Author(s) -
George Hripcsak,
Charles Knirsch,
Li Zhou,
Adam B. Wilcox,
Genevieve B. Melton
Publication year - 2011
Publication title -
journal of biomedical discovery and collaboration
Language(s) - Uncategorized
Resource type - Journals
ISSN - 1747-5333
DOI - 10.5210/disco.v6i0.3581
Subject(s) - gold standard (test) , health records , cohort , electronic health record , computer science , retrospective cohort study , cohort study , scale (ratio) , data science , data mining , statistics , econometrics , medicine , health care , geography , mathematics , pathology , cartography , political science , law
Large-scale electronic health record research introduces biases compared to traditional manually curated retrospective research. We used data from a community-acquired pneumonia study for which we had a gold standard to illustrate such biases. The challenges include data inaccuracy, incompleteness, and complexity, and they can produce in distorted results. We found that a naïve approach approximated the gold standard, but errors on a minority of cases shifted mortality substantially. Manual review revealed errors in both selecting and characterizing the cohort, and narrowing the cohort improved the result. Nevertheless, a significantly narrowed cohort might contain its own biases that would be difficult to estimate.