To test scientific theories and develop individualized treatment rules, researchers often wish to learn heterogeneous treatment effects that can be consistently found across diverse populations and contexts. We consider the problem of generalizing heterogeneous treatment effects (HTE) based on data from multiple sites. A key challenge is that a target population may differ from the source sites in unknown and unobservable ways. This means that the estimates from site-specific models lack external validity, and a simple pooled analysis risks bias. We develop a robust CATE (conditional average treatment effect) estimation methodology with multisite data from heterogeneous populations. We propose a minimax-regret framework that learns a generalizable CATE model by minimizing the worst-case regret over a class of target populations whose CATE can be represented as convex combinations of site-specific CATEs. Using robust optimization, the proposed methodology accounts for distribution shifts in both individual covariates and treatment effect heterogeneity across sites. We show that the resulting CATE model has an interpretable closed-form solution, expressed as a weighted average of site-specific CATE models. Thus, researchers can utilize a flexible CATE estimation method within each site and aggregate site-specific estimates to produce the final model. Through simulations and a real-world application, we show that the proposed methodology improves the robustness and generalizability of existing approaches.
翻译:为检验科学理论并发展个体化治疗规则,研究者常希望学习能够在不同人群与情境中一致发现的异质处理效应。本文考虑基于多站点数据泛化异质处理效应的问题。关键挑战在于目标群体可能以未知且不可观测的方式区别于源站点。这意味着站点特定模型的估计缺乏外部效度,而简单合并分析则存在偏倚风险。我们提出一种基于异质群体多站点数据的稳健条件平均处理效应估计方法。通过极小极大遗憾框架,该框架通过最小化一类目标群体上的最坏情况遗憾来学习可泛化的CATE模型,其中目标群体的CATE可表示为站点特定CATE的凸组合。利用鲁棒优化,所提方法能够处理个体协变量分布偏移与跨站点处理效应异质性的双重挑战。我们证明所得CATE模型具有可解释的闭式解,表现为站点特定CATE模型的加权平均。因此研究者可在各站点内使用灵活的CATE估计方法,并通过聚合站点特定估计得到最终模型。通过仿真与真实数据应用,我们证明所提方法提升了现有方法的稳健性与泛化能力。