A Cross-Validated Targeted Maximum Likelihood Estimator for Data-Adaptive Experiment Selection Applied to the Augmentation of RCT Control Arms with External Data

2023 年 2 月 20 日

翻译：一种用于数据自适应实验选择的交叉验证靶向最大似然估计器及其在随机对照试验对照组与外部数据增强中的应用

Lauren Eyler Dang,Jens Magelund Tarp,Trine Julie Abrahamsen,Kajsa Kvist,John B Buse,Maya Petersen,Mark van der Laan

from arxiv, 27 pages, 4 figures

Augmenting the control arm of a randomized controlled trial (RCT) with external data may increase power at the risk of introducing bias. Existing data fusion estimators generally rely on stringent assumptions or may have decreased coverage or power in the presence of bias. Framing the problem as one of data-adaptive experiment selection, potential experiments include the RCT only or the RCT combined with different candidate real-world datasets. To select and analyze the experiment with the optimal bias-variance tradeoff, we develop a novel experiment-selector cross-validated targeted maximum likelihood estimator (ES-CVTMLE). The ES-CVTMLE uses two bias estimates: 1) a function of the difference in conditional mean outcome under control between the RCT and combined experiments and 2) an estimate of the average treatment effect on a negative control outcome (NCO). We define the asymptotic distribution of the ES-CVTMLE under varying magnitudes of bias and construct confidence intervals by Monte Carlo simulation. In simulations involving violations of identification assumptions, the ES-CVTMLE had better coverage than test-then-pool approaches and an NCO-based bias adjustment approach and higher power than one implementation of a Bayesian dynamic borrowing approach. We further demonstrate the ability of the ES-CVTMLE to distinguish biased from unbiased external controls through a re-analysis of the effect of liraglutide on glycemic control from the LEADER trial. The ES-CVTMLE has the potential to improve power while providing relatively robust inference for future hybrid RCT-RWD studies.

翻译：增强随机对照试验（RCT）的对照组与外部数据可能提高统计功效，但存在引入偏倚的风险。现有数据融合估计器通常依赖严格假设，或在存在偏倚时可能出现覆盖率或功效降低的问题。将上述问题框架化为数据自适应实验选择，潜在实验可包括仅使用RCT数据，或RCT与不同候选真实世界数据集的组合。为选择并分析具有最优偏差-方差权衡的实验，我们提出一种新型实验选择器交叉验证靶向最大似然估计器（ES-CVTMLE）。该估计器采用两种偏倚估计量：1）基于RCT与组合实验条件下对照组条件均值结局之差的函数；2）基于阴性对照结局（NCO）的平均处理效应估计值。我们定义了ES-CVTMLE在不同偏倚幅度下的渐近分布，并通过蒙特卡洛模拟构建置信区间。在涉及识别假设违背的模拟中，ES-CVTMLE的覆盖率优于"先检验后合并"方法及基于NCO的偏倚校正方法，且功效优于贝叶斯动态借用方法的一种实现。我们进一步通过重新分析LEADER试验中利拉鲁肽对血糖控制的影响，验证了ES-CVTMLE区分有偏和无偏外部对照的能力。该估计器有望在提升统计功效的同时，为未来混合RCT-真实世界数据研究提供相对稳健的推断。