Interpretability is a key concern in estimating heterogeneous treatment effects using machine learning methods, especially for healthcare applications where high-stake decisions are often made. Inspired by the Predictive, Descriptive, Relevant framework of interpretability, we propose causal rule learning which finds a refined set of causal rules characterizing potential subgroups to estimate and enhance our understanding of heterogeneous treatment effects. Causal rule learning involves three phases: rule discovery, rule selection, and rule analysis. In the rule discovery phase, we utilize a causal forest to generate a pool of causal rules with corresponding subgroup average treatment effects. The selection phase then employs a D-learning method to select a subset of these rules to deconstruct individual-level treatment effects as a linear combination of the subgroup-level effects. This helps to answer an ignored question by previous literature: what if an individual simultaneously belongs to multiple groups with different average treatment effects? The rule analysis phase outlines a detailed procedure to further analyze each rule in the subset from multiple perspectives, revealing the most promising rules for further validation. The rules themselves, their corresponding subgroup treatment effects, and their weights in the linear combination give us more insights into heterogeneous treatment effects. Simulation and real-world data analysis demonstrate the superior performance of causal rule learning on the interpretable estimation of heterogeneous treatment effect when the ground truth is complex and the sample size is sufficient.
翻译:可解释性是使用机器学习方法估计异质性处理效应时的关键问题,尤其是在医疗健康等需要做出高风险决策的应用场景中。受可解释性中的预测性、描述性和相关性框架启发,我们提出因果规则学习,该方法通过发现一组精炼的因果规则来刻画潜在子群体,从而更好地估计和理解异质性处理效应。因果规则学习包括三个阶段:规则发现、规则选择和规则分析。在规则发现阶段,我们利用因果森林生成一组因果规则池,并计算每个规则对应的子群体平均处理效应。随后的选择阶段采用D-learning方法从中选取子集规则,将个体级处理效应解构为子群体级效应的线性组合。这有助于回答先前文献中忽视的问题:当个体同时属于多个具有不同平均处理效应的子群体时,其处理效应应如何解释?规则分析阶段则详细阐述了对子集中每个规则进行多角度分析的流程,以揭示最具前景的规则供进一步验证。规则本身、对应的子群体处理效应及其在线性组合中的权重,为我们理解异质性处理效应提供了更深入的洞见。模拟实验和真实数据分析表明,在处理效应真实机制复杂且样本量充足时,因果规则学习在可解释估计异质性处理效应方面具有优越性能。