Premium
Why We Should Not Be Indifferent to Specification Choices for Difference‐in‐Differences
Author(s) -
Ryan Andrew M.,
Burgess James F.,
Dimick Justin B.
Publication year - 2015
Publication title -
health services research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.706
H-Index - 121
eISSN - 1475-6773
pISSN - 0017-9124
DOI - 10.1111/1475-6773.12270
Subject(s) - estimator , statistics , matching (statistics) , average treatment effect , propensity score matching , standard deviation , standard error , point estimation , mathematics , econometrics , inference , permutation (music) , percentage point , monte carlo method , statistical hypothesis testing , null hypothesis , computer science , physics , artificial intelligence , acoustics
Objective To evaluate the effects of specification choices on the accuracy of estimates in difference‐in‐differences ( DID ) models. Data Sources Process‐of‐care quality data from Hospital Compare between 2003 and 2009. Study Design We performed a Monte Carlo simulation experiment to estimate the effect of an imaginary policy on quality. The experiment was performed for three different scenarios in which the probability of treatment was (1) unrelated to pre‐intervention performance; (2) positively correlated with pre‐intervention levels of performance; and (3) positively correlated with pre‐intervention trends in performance. We estimated alternative DID models that varied with respect to the choice of data intervals, the comparison group, and the method of obtaining inference. We assessed estimator bias as the mean absolute deviation between estimated program effects and their true value. We evaluated the accuracy of inferences through statistical power and rates of false rejection of the null hypothesis. Principal Findings Performance of alternative specifications varied dramatically when the probability of treatment was correlated with pre‐intervention levels or trends. In these cases, propensity score matching resulted in much more accurate point estimates. The use of permutation tests resulted in lower false rejection rates for the highly biased estimators, but the use of clustered standard errors resulted in slightly lower false rejection rates for the matching estimators. Conclusions When treatment and comparison groups differed on pre‐intervention levels or trends, our results supported specifications for DID models that include matching for more accurate point estimates and models using clustered standard errors or permutation tests for better inference. Based on our findings, we propose a checklist for DID analysis.