In health and social sciences, it is critically important to identify subgroups of the study population where there is notable heterogeneity of treatment effects (HTE) with respect to the population average. Decision trees have been proposed and commonly adopted for the data-driven discovery of HTE due to their high level of interpretability. However, single-tree discovery of HTE can be unstable and oversimplified. This paper introduces the Causal Rule Ensemble (CRE), a new method for HTE discovery and estimation using an ensemble-of-trees approach. CRE offers several key features, including 1) an interpretable representation of the HTE; 2) the ability to explore complex heterogeneity patterns; and 3) high stability in subgroups discovery. The discovered subgroups are defined in terms of interpretable decision rules. Estimation of subgroup-specific causal effects is performed via a two-stage approach, for which we provide theoretical guarantees. Through simulations, we show that the CRE method is highly competitive compared to state-of-the-art techniques. Finally, we apply CRE to discover the heterogeneous health effects of exposure to air pollution on mortality for 35.3 million Medicare beneficiaries across the contiguous U.S.
翻译:在健康与社会科学中,识别研究人群中存在显著异质性处理效应(相对于总体平均值)的亚组至关重要。决策树因其高度的可解释性,已被提出并广泛用于数据驱动的HTE发现。然而,单棵决策树进行HTE发现可能不稳定且过于简化。本文介绍了因果规则集成(CRE),这是一种利用集成树方法进行HTE发现与估计的新方法。CRE具备几个关键特性,包括:1)HTE的可解释表示;2)探索复杂异质性模式的能力;3)亚组发现的高稳定性。所发现的亚组由可解释的决策规则定义。亚组特异性因果效应的估计通过两阶段方法进行,我们为此提供了理论保证。通过模拟实验,我们证明CRE方法与最先进技术相比具有高度竞争力。最后,我们应用CRE来发现空气污染暴露对美国本土3,530万医疗保险受益人死亡率影响的异质性健康效应。