We introduce Matched Machine Learning, a framework that combines the flexibility of machine learning black boxes with the interpretability of matching, a longstanding tool in observational causal inference. Interpretability is paramount in many high-stakes application of causal inference. Current tools for nonparametric estimation of both average and individualized treatment effects are black-boxes that do not allow for human auditing of estimates. Our framework uses machine learning to learn an optimal metric for matching units and estimating outcomes, thus achieving the performance of machine learning black-boxes, while being interpretable. Our general framework encompasses several published works as special cases. We provide asymptotic inference theory for our proposed framework, enabling users to construct approximate confidence intervals around estimates of both individualized and average treatment effects. We show empirically that instances of Matched Machine Learning perform on par with black-box machine learning methods and better than existing matching methods for similar problems. Finally, in our application we show how Matched Machine Learning can be used to perform causal inference even when covariate data are highly complex: we study an image dataset, and produce high quality matches and estimates of treatment effects.
翻译:我们提出匹配机器学习框架,该框架将机器学习黑箱的灵活性与匹配方法(观察性因果推断中的传统工具)的可解释性相结合。在因果推断的诸多高风险应用中,可解释性至关重要。当前用于估计平均及个体化治疗效应的非参数工具均为黑箱模型,无法实现人工审计。本框架利用机器学习学习最优匹配度量以匹配单元并估计结果,从而在保持可解释性的同时达到机器学习黑箱的性能。该通用框架涵盖多个已发表研究作为特例。我们为所提框架提供了渐近推断理论,使研究者能够构建个体化与平均治疗效应估计值的近似置信区间。实证研究表明,匹配机器学习实例在性能上与黑箱机器学习方法相当,且优于同类问题中的现有匹配方法。最后,在应用中我们展示了匹配机器学习如何在协变量数据高度复杂时进行因果推断:通过分析图像数据集,生成了高质量匹配对及治疗效应估计值。