Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.
翻译:随机对照试验(RCT)是理解因果效应的基石,但由于效应异质性和代表性不足,将其推论推广至目标人群面临挑战。本文聚焦于识别与刻画RCT中代表性不足子群体的关键问题,提出了一种用于优化目标人群以提升可推广性的新框架。我们引入了一种基于优化的方法——最优树拉斯维加斯集合(ROOT)——来刻画代表性不足的群体。ROOT通过最小化目标平均处理效应估计量的方差,优化目标亚群分布,从而确保更精确的处理效应估计。值得注意的是,ROOT能够生成代表性不足人群的可解释特征,有助于研究者进行有效沟通。与替代方法相比,我们的方法在合成数据实验中展现出更高的精确度和可解释性。我们将该方法应用于将“激动剂替代疗法起始治疗”(START)试验——探究阿片类药物使用障碍药物治疗效果——的推论推广至“治疗事件数据集:入院记录”(TEDS-A)所代表的真实世界人群。通过使用ROOT优化目标人群,本框架为提升决策准确性并为未来多样化人群中的试验设计提供系统性方案。