Transfer Learning With Efficient Estimators to Optimally Leverage Historical Data in Analysis of Randomized Trials

Randomized controlled trials (RCTs) are a cornerstone of comparative effectiveness because they remove the confounding bias present in observational studies. However, RCTs are typically much smaller than observational studies because of financial and ethical considerations. Therefore it is of great interest to be able to incorporate plentiful observational data into the analysis of smaller RCTs. Previous estimators developed for this purpose rely on unrealistic additional assumptions without which the added data can bias the effect estimate. Recent work proposed an alternative method (prognostic adjustment) that imposes no additional assumption and increases efficiency in the analysis of RCTs. The idea is to use the observational data to learn a prognostic model: a regression of the outcome onto the covariates. The predictions from this model, generated from the RCT subjects' baseline variables, are used as a covariate in a linear model. In this work, we extend this framework to work when conducting inference with nonparametric efficient estimators in trial analysis. Using simulations, we find that this approach provides greater power (i.e., smaller standard errors) than without prognostic adjustment, especially when the trial is small. We also find that the method is robust to observed or unobserved shifts between the observational and trial populations and does not introduce bias. Lastly, we showcase this estimator leveraging real-world historical data on a randomized blood transfusion study of trauma patients.

翻译：随机对照试验（RCTs）是比较有效性研究的基石，因为它们消除了观察性研究中存在的混杂偏倚。然而，由于经济和伦理方面的考虑，RCTs通常比观察性研究规模小得多。因此，将丰富的观察性数据纳入较小规模RCTs的分析中具有重大意义。为此目的开发的早期估计量依赖于不切实际的额外假设，若缺乏这些假设，添加的数据可能导致效应估计产生偏倚。近期研究提出了一种替代方法（预后调整），该方法无需额外假设，并能提升RCTs分析的效率。其核心思想是利用观察性数据学习一个预后模型：将结局变量对协变量进行回归。该模型基于RCT受试者的基线变量生成的预测值，被用作线性模型中的协变量。在本研究中，我们将这一框架扩展至在试验分析中使用非参数高效估计量进行推断的场景。通过模拟研究，我们发现该方法相比未进行预后调整提供了更高的统计功效（即更小的标准误），尤其在试验规模较小时效果显著。我们还发现该方法对观察性人群与试验人群之间可观测或不可观测的偏移具有稳健性，且不会引入偏倚。最后，我们通过一项关于创伤患者随机输血研究的真实世界历史数据，展示了该估计量的应用。