Premium
Inflation of type I error rates due to differential misclassification in EHR‐derived outcomes: Empirical illustration using breast cancer recurrence
Author(s) -
Chen Yong,
Wang Jianqiao,
Chubak Jessica,
Hubbard Rebecca A.
Publication year - 2019
Publication title -
pharmacoepidemiology and drug safety
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.023
H-Index - 96
eISSN - 1099-1557
pISSN - 1053-8569
DOI - 10.1002/pds.4680
Subject(s) - medicine , nominal level , breast cancer , type i and type ii errors , spurious relationship , statistics , information bias , cohort , selection bias , oncology , cancer , confidence interval , pathology , mathematics
Purpose Many outcomes derived from electronic health records (EHR) not only are imperfect but also may suffer from exposure‐dependent differential misclassification due to variability in the quality and availability of EHR data across exposure groups. The objective of this study was to quantify the inflation of type I error rates that can result from differential outcome misclassification. Methods We used data on gold‐standard and EHR‐derived second breast cancers in a cohort of women with a prior breast cancer diagnosis from 1993 to 2006 enrolled in Kaiser Permanente Washington. We simulated an exposure that was independent of the true outcome status. A surrogate outcome was then simulated with varying sensitivity and specificity according to exposure status. We estimated the type I error rate for a test of association relating this exposure to the surrogate outcome, while varying outcome sensitivity and specificity in exposed individuals. Results Type I error rates were substantially inflated above the nominal level (5%) for even modest departures from nondifferential misclassification. Holding sensitivity in exposed and unexposed groups at 85%, a difference in specificity of 10% between the exposed and unexposed (80% vs 90%) resulted in a 36% type I error rate. Type I error was inflated more by differential specificity than sensitivity. Conclusions Differential outcome misclassification may induce spurious findings. Researchers using EHR‐derived outcomes should use misclassification‐adjusted methods whenever possible or conduct sensitivity analyses to investigate the possibility of false‐positive findings, especially for exposures that may be related to the accuracy of outcome ascertainment.