z-logo
Premium
Simpson's paradox in the integrated discrimination improvement
Author(s) -
Chipman J.,
Braun D.
Publication year - 2016
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.6862
Subject(s) - brier score , covariate , metric (unit) , receiver operating characteristic , calibration , statistics , computer science , goodness of fit , performance metric , stratum , econometrics , mathematics , biology , paleontology , operations management , management , economics
The integrated discrimination improvement (IDI) is commonly used to compare two risk prediction models; it summarizes the extent a new model increases risk in events and decreases risk in non‐events. The IDI averages risks across events and non‐events and is therefore susceptible to Simpson's paradox. In some settings, adding a predictive covariate to a well calibrated model results in an overall negative (positive) IDI. However, if stratified by that same covariate, the strata‐specific IDIs are positive (negative). Meanwhile, the calibration (observed to expected ratio and Hosmer–Lemeshow Goodness of Fit Test), area under the receiver operating characteristic curve, and Brier score improve overall and by stratum. We ran extensive simulations to investigate the impact of an imbalanced covariate upon metrics (IDI, area under the receiver operating characteristic curve, Brier score, and R 2 ), provide an analytic explanation for the paradox in the IDI, and use an investigative metric, a Weighted IDI, to better understand the paradox. In simulations, all instances of the paradox occurred under stratum‐specific mis‐calibration, yet there were mis‐calibrated settings in which the paradox did not occur. The paradox is illustrated on Cancer Genomics Network data by calculating predictions based on two versions of BRCAPRO, a Mendelian risk prediction model for breast and ovarian cancer. In both simulations and the Cancer Genomics Network data, overall model calibration did not guarantee stratum‐level calibration. We conclude that the IDI should only assess model performance among a clinically relevant subset when stratum‐level calibration is strictly met and recommend calculating additional metrics to confirm the direction and conclusions of the IDI. Copyright © 2016 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here