To test scientific theories and develop individualized treatment rules, researchers often wish to learn heterogeneous treatment effects that can be consistently found across diverse populations and contexts. We consider the problem of generalizing heterogeneous treatment effects (HTE) based on data from multiple sites. A key challenge is that a target population may differ from the source sites in unknown and unobservable ways. This means that the estimates from site-specific models lack external validity, and a simple pooled analysis risks bias. We develop a robust CATE (conditional average treatment effect) estimation methodology with multisite data from heterogeneous populations. We propose a minimax-regret framework that learns a generalizable CATE model by minimizing the worst-case regret over a class of target populations whose CATE can be represented as convex combinations of site-specific CATEs. Using robust optimization, the proposed methodology accounts for distribution shifts in both individual covariates and treatment effect heterogeneity across sites. We show that the resulting CATE model has an interpretable closed-form solution, expressed as a weighted average of site-specific CATE models. Thus, researchers can utilize a flexible CATE estimation method within each site and aggregate site-specific estimates to produce the final model. Through simulations and a real-world application, we show that the proposed methodology improves the robustness and generalizability of existing approaches.
翻译:为检验科学理论并制定个性化治疗方案,研究人员通常希望学习能在不同人群与情境中保持一致的异质性处理效应。本文研究基于多站点数据推广异质性处理效应(HTE)的问题。核心挑战在于目标人群与来源站点可能存在未知且不可观测的差异,这意味着基于各站点模型得到的估计缺乏外部有效性,而简单合并分析则存在偏倚风险。为此,我们提出了一种面向异质性人群多站点数据的鲁棒条件平均处理效应(CATE)估计方法。我们构建了极小化最大遗憾框架,通过最小化最坏情况下的遗憾值来学习可推广的CATE模型,其中目标人群的CATE可表示为各站点CATE的凸组合。借助鲁棒优化技术,本方法能有效应对跨站点个体协变量分布偏移与处理效应异质性问题。研究表明,所得CATE模型具有可解释的闭合解形式,可表示为各站点CATE模型的加权平均。因此,研究人员可在各站点内灵活运用CATE估计方法,并通过聚合站点级估计值生成最终模型。通过模拟实验与实际应用案例,我们证明该方法显著提升了现有方法的鲁棒性与泛化能力。