Traditional methods for matching in causal inference are impractical for high-dimensional datasets. They suffer from the curse of dimensionality: exact matching and coarsened exact matching find exponentially fewer matches as the input dimension grows, and propensity score matching may match highly unrelated units together. To overcome this problem, we develop theoretical results which motivate the use of neural networks to obtain non-trivial, multivariate balancing scores of a chosen level of coarseness, in contrast to the classical, scalar propensity score. We leverage these balancing scores to perform matching for high-dimensional causal inference and call this procedure neural score matching. We show that our method is competitive against other matching approaches on semi-synthetic high-dimensional datasets, both in terms of treatment effect estimation and reducing imbalance.
翻译:传统因果推断中的匹配方法在处理高维数据集时存在局限性。这些方法受到维度灾难的影响:随着输入维度的增加,精确匹配和粗化精确匹配能够找到的匹配样本数量呈指数级下降,而倾向评分匹配则可能将高度不相关的单元匹配在一起。为克服这一问题,我们提出了理论结果,论证了利用神经网络获取非平凡、多变量平衡评分的可行性——这种评分可根据需求设定粗化程度,与传统的标量倾向评分形成对比。基于这些平衡评分,我们实现了高维因果推断中的匹配,并将此方法命名为神经评分匹配。实验表明,在半合成高维数据集上,我们的方法在处理效应估计和降低不平衡性方面均优于其他匹配方法。