Transfer Learning With Efficient Estimators to Optimally Leverage Historical Data in Analysis of Randomized Trials

Although randomized controlled trials (RCTs) are a cornerstone of comparative effectiveness, they typically have much smaller sample size than observational studies because of financial and ethical considerations. Therefore there is interest in using plentiful historical data (either observational data or prior trials) to reduce trial sizes. Previous estimators developed for this purpose rely on unrealistic assumptions, without which the added data can bias the treatment effect estimate. Recent work proposed an alternative method (prognostic covariate adjustment) that imposes no additional assumptions and increases efficiency in trial analyses. The idea is to use historical data to learn a prognostic model: a regression of the outcome onto the covariates. The predictions from this model, generated from the RCT subjects' baseline variables, are then used as a covariate in a linear regression analysis of the trial data. In this work, we extend prognostic adjustment to trial analyses with nonparametric efficient estimators, which are more powerful than linear regression. We provide theory that explains why prognostic adjustment improves small-sample point estimation and inference without any possibility of bias. Simulations corroborate the theory: efficient estimators using prognostic adjustment compared to without provides greater power (i.e., smaller standard errors) when the trial is small. Population shifts between historical and trial data attenuate benefits but do not introduce bias. We showcase our estimator using clinical trial data provided by Novo Nordisk A/S that evaluates insulin therapy for individuals with type II diabetes.

翻译：尽管随机对照试验（RCT）是比较有效性研究的基石，但由于财务和伦理方面的考量，其样本量通常远小于观察性研究。因此，利用丰富的历史数据（包括观察性数据或既往试验数据）来缩小试验规模具有重要研究价值。现有为此设计的估计量依赖于不切实际的假设，若缺乏这类假设，引入额外数据反而可能导致治疗效果估计产生偏倚。最新研究提出了一种替代方法（预后协变量调整），该方法无需附加假设即可提高试验分析的效率。其核心思想是利用历史数据构建预后模型：即建立结局变量与协变量之间的回归关系。通过该模型从RCT受试者的基线变量中生成的预测值，将被用作协变量纳入试验数据的线性回归分析中。本研究将预后调整方法扩展到采用非参数高效估计量的试验分析中，此类估计量相较于线性回归具有更强的统计效力。我们提出了理论依据，解释了预后调整如何在不引入任何偏倚的情况下改进小样本的点估计与推断性能。仿真结果验证了该理论：当试验规模较小时，与未采用预后调整的高效估计量相比，采用预后调整的估计量可提供更高的统计效力（即更小的标准误）。历史数据与试验数据之间的群体迁移虽会削弱部分收益，但不会引入偏倚。我们利用诺和诺德公司提供的2型糖尿病胰岛素治疗临床试验数据，展示了所提出估计量的应用效果。