Many major works in social science employ matching to make causal conclusions, but different matches on the same data may produce different treatment effect estimates, even when they achieve similar balance or minimize the same loss function. We discuss reasons and consequences of this problem. We present evidence of this problem by replicating ten papers that use matching and we find that different popular matching algorithms produce inconsistent results. We introduce Matching Bounds: a finite-sample, nonstochastic method that allows analysts to know whether a matched sample that produces different results with the same levels of balance and overall match quality could be obtained from their data. We apply Matching Bounds to a replication of two studies and show that in one case results are robust to this issue and in another they are not.
翻译:社会科学领域的许多重要研究采用匹配方法进行因果推断,但即使在同一数据集上实现相似平衡性或最小化相同损失函数时,不同匹配结果仍可能产生不同的处理效应估计。我们探讨了这一问题的成因与后果,并通过复现十篇使用匹配方法的论文呈现该问题的实证证据:不同主流匹配算法会得出不一致的研究结论。我们提出"匹配边界"方法——一种有限样本下的非随机化技术,使分析者能够判断其数据中是否存在在相同平衡水平和整体匹配质量下产生不同结果的匹配样本。通过将该方法应用于两项研究的复现,我们证明其中一项的结论对该问题具有稳健性,而另一项则不然。