Estimating heterogeneous treatment effects is crucial for informing personalized treatment strategies and policies. While multiple studies can improve the accuracy and generalizability of results, leveraging them for estimation is statistically challenging. Existing approaches often assume identical heterogeneous treatment effects across studies, but this may be violated due to various sources of between-study heterogeneity, including differences in study design, confounders, and sample characteristics. To this end, we propose a unifying framework for multi-study heterogeneous treatment effect estimation that is robust to between-study heterogeneity in the nuisance functions and treatment effects. Our approach, the multi-study R-learner, extends the R-learner to obtain principled statistical estimation with modern machine learning (ML) in the multi-study setting. The multi-study R-learner is easy to implement and flexible in its ability to incorporate ML for estimating heterogeneous treatment effects, nuisance functions, and membership probabilities, which borrow strength across heterogeneous studies. It achieves robustness in confounding adjustment through its loss function and can leverage both randomized controlled trials and observational studies. We provide asymptotic guarantees for the proposed method in the case of series estimation and illustrate using real cancer data that it has the lowest estimation error compared to existing approaches in the presence of between-study heterogeneity.
翻译:估计异质性治疗效果对于制定个性化治疗策略和政策至关重要。尽管多研究能够提高结果的准确性和泛化能力,但利用这些研究进行估计在统计上具有挑战性。现有方法通常假设各研究间具有相同的异质性治疗效果,但由于研究设计、混杂因素和样本特征差异等多种研究间异质性来源,这一假设可能被违反。为此,我们提出了一个稳健的多研究异质性治疗效果估计统一框架,该框架对辅助函数和治疗效果中的研究间异质性具有鲁棒性。我们的方法——多研究R学习器——将R学习器扩展至多研究场景,通过现代机器学习实现原则性的统计估计。多研究R学习器易于实现且灵活,能够整合机器学习来估计异质性治疗效果、辅助函数和成员概率,从而从异质性研究中借力。其损失函数实现了混杂调整的鲁棒性,并能同时利用随机对照试验和观察性研究。针对序列估计情形,我们提供了所提方法的渐近保证,并利用真实癌症数据表明,在存在研究间异质性时,该方法相比现有方法具有最低的估计误差。