Background: Policy evaluation studies that assess how state-level policies affect health-related outcomes are foundational to health and social policy research. The relative ability of newer analytic methods to address confounding, a key source of bias in observational studies, has not been closely examined. Methods: We conducted a simulation study to examine how differing magnitudes of confounding affected the performance of four methods used for policy evaluations: (1) the two-way fixed effects (TWFE) difference-in-differences (DID) model; (2) a one-period lagged autoregressive (AR) model; (3) augmented synthetic control method (ASCM); and (4) the doubly robust DID approach with multiple time periods from Callaway-Sant'Anna (CSA). We simulated our data to have staggered policy adoption and multiple confounding scenarios (i.e., varying the magnitude and nature of confounding relationships). Results: Bias increased for each method: (1) as confounding magnitude increases; (2) when confounding is generated with respect to prior outcome trends (rather than levels), and (3) when confounding associations are nonlinear (rather than linear). The AR and ASCM have notably lower root mean squared error than the TWFE model and CSA approach for all scenarios; the exception is nonlinear confounding by prior trends, where CSA excels. Coverage rates are unreasonably high for ASCM (e.g., 100%), reflecting large model-based standard errors and wide confidence intervals in practice. Conclusions: Our simulation study indicated that no single method consistently outperforms the others. But a researcher's toolkit should include all methodological options. Our simulations and associated R package can help researchers choose the most appropriate approach for their data.
翻译:背景:评估州级政策如何影响健康相关结局的政策评估研究是健康与社会政策研究的基石。较新的分析方法在解决混杂问题(观察性研究中的主要偏倚来源)方面的相对能力尚未得到深入检验。方法:我们开展模拟研究,探究不同强度的混杂效应如何影响四种政策评估方法的性能:(1)双向固定效应(TWFE)双重差分(DID)模型;(2)单期滞后自回归(AR)模型;(3)增强型合成控制法(ASCM);(4)基于Callaway-Sant'Anna(CSA)的多期双重稳健DID方法。我们模拟了交错式政策采纳场景,并设置多种混杂情景(即改变混杂关系的强度与性质)。结果:各方法的偏倚均随以下因素增加而增大:(1)混杂强度增大;(2)混杂基于既往结局趋势(而非水平值)生成;(3)混杂关联呈非线性(而非线性)。在所有情景中,AR与ASCM的均方根误差显著低于TWFE模型与CSA方法;唯一例外是既往趋势导致的非线性混杂情景,此时CSA表现最优。ASCM的覆盖率异常偏高(如100%),反映其基于模型的标准误差过大且置信区间过宽。结论:本模拟研究表明无单一方法能持续优于其他方法,但研究者工具包应涵盖所有方法论选项。我们的模拟研究及配套R包可辅助研究人员根据数据特征选择最适方法。