The estimation of Conditional Average Treatment Effects (CATE) is crucial for understanding the heterogeneity of treatment effects in clinical trials. We evaluate the performance of common methods, including causal forests and various meta-learners, across a diverse set of scenarios, revealing that each of the methods struggles in one or more of the tested scenarios. Given the inherent uncertainty of the data-generating process in real-life scenarios, the robustness of a CATE estimator to various scenarios is critical for its reliability. To address this limitation of existing methods, we propose two new ensemble methods that integrate multiple estimators to enhance prediction stability and performance - Stacked X-Learner which uses the X-Learner with model stacking for estimating the nuisance functions, and Consensus Based Averaging (CBA), which averages only the models with highest internal agreement. We show that these models achieve good performance across a wide range of scenarios varying in complexity, sample size and structure of the underlying-mechanism, including a biologically driven model for PD-L1 inhibition pathway for cancer treatment. Furthermore, we demonstrate improved performance by the Stacked X-Learner also when comparing to other ensemble methods, including R-Stacking, Causal-Stacking and others.
翻译:条件平均处理效应(CATE)的估计对于理解临床试验中处理效应的异质性至关重要。我们评估了包括因果森林和各种元学习器在内的常用方法在多种场景下的性能,发现每种方法都在一个或多个测试场景中存在不足。鉴于现实场景中数据生成过程固有的不确定性,CATE估计器对各种场景的稳健性对其可靠性至关重要。为应对现有方法的这一局限,我们提出了两种新的集成方法,通过整合多个估计器来提升预测稳定性与性能——其一是使用X-Learner并结合模型堆叠来估计干扰函数的堆叠式X-Learner;其二是仅对内部一致性最高的模型进行平均的共识加权平均法。我们证明,这些模型在复杂度、样本量和底层机制结构各异的广泛场景中均能取得良好性能,包括一个用于癌症治疗的PD-L1抑制通路的生物学驱动模型。此外,与其他集成方法(包括R-Stacking、Causal-Stacking等)相比,堆叠式X-Learner也展现出更优的性能。