In health and social sciences, it is critically important to identify subgroups of the study population where there is notable heterogeneity of treatment effects (HTE) with respect to the population average. Decision trees have been proposed and commonly adopted for data-driven discovery of HTE due to their high level of interpretability. However, single-tree discovery of HTE can be unstable and oversimplified. This paper introduces Causal Rule Ensemble (CRE), a new method for HTE discovery and estimation through an ensemble-of-trees approach. CRE offers several key features, including 1) an interpretable representation of the HTE; 2) the ability to explore complex heterogeneity patterns; and 3) high stability in subgroups discovery. The discovered subgroups are defined in terms of interpretable decision rules. Estimation of subgroup-specific causal effects is performed via a two-stage approach for which we provide theoretical guarantees. Via simulations, we show that the CRE method is highly competitive when compared to state-of-the-art techniques. Finally, we apply CRE to discover the heterogeneous health effects of exposure to air pollution on mortality for 35.3 million Medicare beneficiaries across the contiguous U.S.
翻译:在健康和社会科学中,识别研究人群中相对于总体平均效应存在显著异质性处理效应(HTE)的子群体至关重要。决策树因其高度可解释性而被提出并广泛应用于HTE的数据驱动发现。然而,基于单棵树的HTE发现可能存在不稳定性且过度简化。本文提出因果规则集成(CRE)——一种通过树集成方法进行HTE发现与估计的新方法。CRE具有以下关键特征:1)HTE的可解释表示;2)探索复杂异质性模式的能力;3)子群体发现的高稳定性。所发现的子群体通过可解释决策规则定义。子群体特异性因果效应的估计采用两阶段方法,并为其提供理论保证。通过模拟实验,我们证明CRE方法与现有最优技术相比具有高度竞争力。最后,我们将CRE应用于发现美国本土3530万医疗保险受益人中,暴露于空气污染对死亡率的异质性健康效应。