Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.
翻译:随机对照试验(RCT)是理解因果效应的基石,但由于效应异质性和代表不足,将推论推广到目标人群面临挑战。本文聚焦于识别和描述RCT中未被充分代表亚群的关键问题,提出了一种通过优化目标人群来提升可推广性的新框架。我们引入基于优化的方法——最优树拉肖集(ROOT),用于表征未被充分代表的群体。ROOT通过最小化目标平均处理效应估计值的方差来优化目标子群分布,从而确保更精确的处理效应估计。值得注意的是,ROOT能够生成未被充分代表人群的可解释特征,有助于研究者进行有效沟通。通过合成数据实验表明,与传统方法相比,该方法在精确性和可解释性方面均有提升。我们应用该方法将起始阿片类药物替代疗法试验(START)——探究阿片类药物使用障碍药物治疗效果——的推论推广至由治疗疗程数据集中入院数据(TEDS-A)表示的现实世界人群。通过使用ROOT优化目标人群,本框架为提升决策准确性并指导针对多样化人群的未来试验提供了系统性方法。