Investigators often use multi-source data (e.g., multi-center trials, meta-analyses of randomized trials, pooled analyses of observational cohorts) to learn about the effects of interventions in subgroups of some well-defined target population. Such a target population can correspond to one of the data sources of the multi-source data or an external population in which the treatment and outcome information may not be available. We develop and evaluate methods for using multi-source data to estimate subgroup potential outcome means and treatment effects in a target population. We consider identifiability conditions and propose doubly robust estimators that, under mild conditions, are non-parametrically efficient and allow for nuisance functions to be estimated using flexible data-adaptive methods (e.g., machine learning techniques). We also show how to construct confidence intervals and simultaneous confidence bands for the estimated subgroup treatment effects. We examine the properties of the proposed estimators in simulation studies and compare performance against alternative estimators. We also conclude that our methods work well when the sample size of the target population is much larger than the sample size of the multi-source data. We illustrate the proposed methods in a meta-analysis of randomized trials for schizophrenia.
翻译:研究者常利用多源数据(如多中心试验、随机试验的荟萃分析、观察性队列的合并分析)来了解干预措施在某个明确定义的目标人群亚组中的效应。此类目标人群可对应于多源数据中的某一数据来源,或一个无法获取治疗与结局信息的外部人群。我们开发并评估了利用多源数据估计目标人群亚组潜在结局均值与处理效应的方法。我们考虑了可识别性条件,并提出了双重稳健估计量,该估计量在温和条件下具有非参数有效性,且允许使用灵活的数据自适应方法(如机器学习技术)估计干扰参数。我们还展示了如何为估计的亚组处理效应构建置信区间与同时置信带。我们通过模拟研究检验了所提估计量的性质,并与其他替代估计量进行了性能比较。同时得出结论:当目标人群样本量远大于多源数据样本量时,我们的方法表现良好。我们以一项精神分裂症随机试验的荟萃分析为例,对所提方法进行了说明。