In a clinical trial, the random allocation aims to balance prognostic factors between arms, preventing true confounders. However, residual differences due to chance may introduce near-confounders. Adjusting on prognostic factors is therefore recommended, especially because the related increase of the power. In this paper, we hypothesized that G-computation associated with machine learning could be a suitable method for randomized clinical trials even with small sample sizes. It allows for flexible estimation of the outcome model, even when the covariates' relationships with outcomes are complex. Through simulations, penalized regressions (Lasso, Elasticnet) and algorithm-based methods (neural network, support vector machine, super learner) were compared. Penalized regressions reduced variance but may introduce a slight increase in bias. The associated reductions in sample size ranged from 17\% to 54\%. In contrast, algorithm-based methods, while effective for larger and more complex data structures, underestimated the standard deviation, especially with small sample sizes. In conclusion, G-computation with penalized models, particularly Elasticnet with splines when appropriate, represents a relevant approach for increasing the power of RCTs and accounting for potential near-confounders.
翻译:在临床试验中,随机分配旨在平衡各干预组间的预后因素,以防止真实混杂变量的影响。然而,由随机机会导致的残余差异仍可能引入近似混杂因素。因此,建议对预后因素进行调整,特别是考虑到这种方法能提升检验效能。本文假设,即使在小样本情况下,结合机器学习的G-计算仍可成为随机临床试验的适用方法。该方法能够灵活估计结局模型,即使在协变量与结局关系复杂时亦如此。通过模拟研究,比较了惩罚回归(Lasso、Elasticnet)与基于算法的方法(神经网络、支持向量机、超级学习器)。惩罚回归降低了方差,但可能轻微增加偏倚。其对应的样本量减少幅度为17%至54%。相比之下,基于算法的方法虽然对更大、更复杂的数据结构有效,但会低估标准差,尤其在小样本情况下。综上所述,采用惩罚模型(特别是适时结合样条的Elasticnet)的G-计算,是提升随机对照试验效能并处理潜在近似混杂因素的一种相关方法。