Randomized trials are considered the gold standard for making informed decisions in medicine, yet they often lack generalizability to the patient populations in clinical practice. Observational studies, on the other hand, cover a broader patient population but are prone to various biases. Thus, before using an observational study for decision-making, it is crucial to benchmark its treatment effect estimates against those derived from a randomized trial. We propose a novel strategy to benchmark observational studies beyond the average treatment effect. First, we design a statistical test for the null hypothesis that the treatment effects estimated from the two studies, conditioned on a set of relevant features, differ up to some tolerance. We then estimate an asymptotically valid lower bound on the maximum bias strength for any subgroup in the observational study. Finally, we validate our benchmarking strategy in a real-world setting and show that it leads to conclusions that align with established medical knowledge.
翻译:随机试验被视为医学决策的金标准,但其结果往往难以推广至临床实践中的患者群体。另一方面,观察性研究虽覆盖更广泛的患者人群,却容易受到多种偏差的影响。因此,在将观察性研究用于决策之前,必须将其治疗效果估计与随机试验的结果进行基准比对。我们提出了一种超越平均治疗效果的新颖基准测试策略。首先,我们设计了一个统计检验,其原假设为:基于一组相关特征的条件分析,两项研究估计的治疗效果差异在可容忍范围内。随后,我们估算了观察性研究中任意亚组的最大偏差强度的渐近有效下界。最后,我们在真实场景中验证了该基准测试策略,并表明其结论与已有医学知识一致。