z-logo
Premium
Robust empirical calibration of p ‐values using observational data
Author(s) -
Schuemie Martijn J.,
Hripcsak George,
Ryan Patrick B.,
Madigan David,
Suchard Marc A.
Publication year - 2016
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.6977
Subject(s) - observational study , calibration , statistics , computer science , econometrics , medicine , mathematics
In our previous paper [1], we proposed empirical calibration of p-values as a strategy for mitigating risk of systematic error when estimating average treatment effects from observational studies. By estimating the effect of exposure on outcomes across a collection of settings where the exposure is not believed to cause the outcome (negative controls), one can estimate an empirical null distribution of the exposure effect and compute calibrated p-values that take both random and systematic error into account. Gruber and Tchetgen [2] recently published a simulation that is intended to demonstrate the theoretical scenario in which empirical calibration may not be recommended. We welcome a thoughtful debate about the theoretical and empirical underpinnings of empirical calibration and share an enthusiasm to develop practical solutions that can be made broadly applicable to all observational analyses as a means of generating more reliable evidence. However, as we explain in more detail later, we would like to highlight and challenge the premise of some of the concerns raised by Gruber and Tchetgen and demonstrate how our empirical findings support empirical calibration as a robust approach to Type I error control. We believe their simulations are not realistic: the simulated estimates of negative controls showed severe bias with an odds ratio (OR) of 3, and perhaps more important, this bias was also simulated to be unusually homogeneous across the sample of negative controls. No one would continue their analysis after observing such estimates for their negative controls, and in all our real-world experiments using calibration, we have never seen such bias or homogeneity of negative controls. They confirm our findings that nominal p-values are strikingly and perhaps dangerously optimistic but recommend that we continue to use them. We believe that p-value calibration should be carried out, and the results should be reported. We first would like to make the following points of clarification:

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here