Statisticians show growing interest in estimating and analyzing heterogeneity in causal effects in observational studies. However, there usually exists a trade-off between accuracy and interpretability for developing a desirable estimator for treatment effects, especially in the case when there are a large number of features in estimation. To make efforts to address the issue, we propose a score-based framework for estimating the Conditional Average Treatment Effect (CATE) function in this paper. The framework integrates two components: (i) leverage the joint use of propensity and prognostic scores in a matching algorithm to obtain a proxy of the heterogeneous treatment effects for each observation, (ii) utilize non-parametric regression trees to construct an estimator for the CATE function conditioning on the two scores. The method naturally stratifies treatment effects into subgroups over a 2d grid whose axis are the propensity and prognostic scores. We conduct benchmark experiments on multiple simulated data and demonstrate clear advantages of the proposed estimator over state of the art methods. We also evaluate empirical performance in real-life settings, using two observational data from a clinical trial and a complex social survey, and interpret policy implications following the numerical results.
翻译:统计学家对观察性研究中因果效应的异质性估计与分析日益关注。然而,在开发理想的处理效应估计量时,准确性与可解释性之间通常存在权衡,尤其在估计涉及大量特征的情况下。为应对这一问题,本文提出了一种基于得分的条件平均处理效应(CATE)函数估计框架。该框架整合了两个组成部分:(i)在匹配算法中联合使用倾向得分与预后得分,为每个观测对象获取异质性处理效应的代理变量;(ii)利用非参数回归树基于这两个得分构建CATE函数的估计量。该方法将处理效应自然地依据以倾向得分和预后得分为坐标轴的二维网格划分为子组。我们在多个模拟数据上进行了基准实验,结果表明所提出的估计量相较于现有最优方法具有显著优势。我们还利用来自临床试验和复杂社会调查的两项观察性数据评估了实证表现,并根据数值结果解读了政策含义。