In health and social sciences, it is critically important to identify subgroups of the study population where a treatment has notable heterogeneity in the causal effects with respect to the average treatment effect. Data-driven discovery of heterogeneous treatment effects (HTE) via decision tree methods has been proposed for this task. Despite its high interpretability, the single-tree discovery of HTE tends to be highly unstable and to find an oversimplified representation of treatment heterogeneity. To accommodate these shortcomings, we propose Causal Rule Ensemble (CRE), a new method to discover heterogeneous subgroups through an ensemble-of-trees approach. CRE has the following features: 1) provides an interpretable representation of the HTE; 2) allows extensive exploration of complex heterogeneity patterns; and 3) guarantees high stability in the discovery. The discovered subgroups are defined in terms of interpretable decision rules, and we develop a general two-stage approach for subgroup-specific conditional causal effects estimation, providing theoretical guarantees. Via simulations, we show that the CRE method has a strong discovery ability and a competitive estimation performance when compared to state-of-the-art techniques. Finally, we apply CRE to discover subgroups most vulnerable to the effects of exposure to air pollution on mortality for 35.3 million Medicare beneficiaries across the contiguous U.S.
翻译:在健康和社会科学中,识别研究人群中处理效应相对于平均处理效应具有显著异质性的子组至关重要。基于决策树方法的数据驱动异质性处理效应(HTE)发现已被提出用于此任务。尽管单树HTE发现具有高可解释性,但往往高度不稳定且发现的处理异质性表示过于简化。为解决这些不足,我们提出因果规则集成(CRE),一种通过树集成方法发现异质性子组的新方法。CRE具有以下特点:1)提供HTE的可解释表示;2)允许对复杂异质性模式进行广泛探索;3)保证发现的高度稳定性。所发现的子组由可解释的决策规则定义,我们开发了通用的两阶段方法用于子组特异性条件因果效应估计,并提供了理论保证。通过模拟实验,我们表明CRE方法相比现有技术具有强大的发现能力和有竞争力的估计性能。最后,我们将CRE应用于识别美国本土3530万医疗保险受益人中暴露于空气污染对死亡率影响最脆弱子组的发现。