Adjusting for (baseline) covariates with working regression models becomes standard practice in the analysis of randomized clinical trials (RCT). When the dimension $p$ of the covariates is large relative to the sample size $n$, specifically $p = o (n)$, adjusting for covariates even in a linear working model by ordinary least squares can yield overly large bias, defeating the purpose of improving efficiency. This issue arises when no structural assumptions are imposed on the outcome model, a scenario that we refer to as the assumption-lean setting. Several new estimators have been proposed to address this issue. However, they focus mainly on simple randomization under the finite-population model, not covering covariate adaptive randomization (CAR) schemes under the superpopulation model. Due to improved covariate balance between treatment groups, CAR is more widely adopted in RCT; and the superpopulation model fits better when subjects are enrolled sequentially or when generalizing to a larger population is of interest. Thus, there is an urgent need to develop procedures in these settings, as the current regulatory guidance provides little concrete direction. In this paper, we fill this gap by demonstrating that an adjusted estimator based on second-order $U$-statistics can almost unbiasedly estimate the average treatment effect and enjoy a guaranteed efficiency gain if $p = o (n)$. In our analysis, we generalize the coupling technique commonly used in the CAR literature to $U$-statistics and also obtain several useful results for analyzing inverse sample Gram matrices by a delicate leave-$m$-out analysis, which may be of independent interest. Both synthetic and semi-synthetic experiments are conducted to demonstrate the superior finite-sample performance of our new estimator compared to popular benchmarks.
翻译:在随机对照试验(RCT)分析中,通过工作回归模型对(基线)协变量进行调整已成为标准实践。当协变量维度$p$相对于样本量$n$较大时,特别是当$p = o (n)$时,即使通过普通最小二乘法在线性工作模型中进行协变量调整,也可能产生过大的偏差,从而无法达到提高效率的目的。这一问题在不对结果模型施加任何结构假设时出现,我们称此场景为假设简约设置。已有若干新估计量被提出以解决此问题。然而,这些方法主要关注有限总体模型下的简单随机化,未涵盖超总体模型下的协变量自适应随机化(CAR)方案。由于CAR能改善治疗组间的协变量平衡,其在RCT中得到更广泛采用;且当受试者被顺序纳入试验或关注向更大总体推广时,超总体模型更为适用。因此,在这些场景中开发相应方法具有迫切需求,因为当前监管指南几乎未提供具体指导。本文通过证明基于二阶$U$-统计量的调整估计量在$p = o (n)$条件下能够几乎无偏地估计平均处理效应,并保证效率提升,从而填补了这一空白。在分析中,我们将CAR文献中常用的耦合技术推广至$U$-统计量,并通过精细的留$m$出分析获得了若干分析逆样本Gram矩阵的有用结果,这些结果可能具有独立价值。通过合成与半合成实验验证了所提新估计量相较于常用基准方法在有限样本下的优越性能。