Identifying and making statistical inferences on differential treatment effects (commonly known as subgroup analysis in clinical research) is central to precision health. Subgroup analysis allows practitioners to pinpoint populations for whom a treatment is especially beneficial or protective, thereby advancing targeted interventions. Tree based recursive partitioning methods are widely used for subgroup analysis due to their interpretability. Nevertheless, these approaches encounter significant limitations, including suboptimal partitions induced by greedy heuristics and overfitting from locally estimated splits, especially under limited sample sizes. To address these limitations, we propose a fused optimal causal tree method that leverages mixed integer optimization (MIO) to facilitate precise subgroup identification. Our approach ensures globally optimal partitions and introduces a parameter fusion constraint to facilitate information sharing across related subgroups. This design substantially improves subgroup discovery accuracy and enhances statistical efficiency. We provide theoretical guarantees by rigorously establishing out of sample risk bounds and comparing them with those of classical tree based methods. Empirically, our method consistently outperforms popular baselines in simulations. Finally, we demonstrate its practical utility through a case study on the Health and Aging Brain Study Health Disparities (HABS-HD) dataset, where our approach yields clinically meaningful insights.
翻译:识别并统计推断差异性治疗效果(在临床研究中通常称为子群分析)是精准医疗的核心。子群分析使实践者能够精确定位对治疗特别有益或具有保护作用的人群,从而推动针对性干预措施的发展。基于树的递归划分方法因其可解释性而被广泛用于子群分析。然而,这些方法面临显著局限性,包括由贪心启发式算法导致的次优划分,以及局部估计分裂(尤其是在有限样本量下)引发的过拟合问题。为应对这些局限性,我们提出了一种融合最优因果树方法,该方法利用混合整数优化(MIO)来促进精确的子群识别。我们的方法确保了全局最优划分,并引入了参数融合约束以促进相关子群间的信息共享。这一设计显著提高了子群发现的准确性并增强了统计效率。我们通过严格建立样本外风险界限并将其与经典基于树的方法进行比较,提供了理论保证。在实证研究中,我们的方法在模拟实验中始终优于常用基线。最后,我们通过对健康与衰老脑研究健康差异(HABS-HD)数据集的案例研究,展示了该方法的实际效用,其中我们的方法得出了具有临床意义的见解。