In recent years, studies such as \cite{carmon2019unlabeled,gowal2021improving,xing2022artificial} have demonstrated that incorporating additional real or generated data with pseudo-labels can enhance adversarial training through a two-stage training approach. In this paper, we perform a theoretical analysis of the asymptotic behavior of this method in high-dimensional linear regression. While a double-descent phenomenon can be observed in ridgeless training, with an appropriate $\mathcal{L}_2$ regularization, the two-stage adversarial training achieves a better performance. Finally, we derive a shortcut cross-validation formula specifically tailored for the two-stage training method.
翻译:近年来,如\cite{carmon2019unlabeled,gowal2021improving,xing2022artificial}等研究表明,通过两阶段训练方法引入带有伪标签的额外真实或生成数据,能够增强对抗训练效果。本文从理论上分析了该方法在高维线性回归中的渐近行为。尽管在无脊回归中可观察到双下降现象,但通过适当的$\mathcal{L}_2$正则化,两阶段对抗训练能够实现更优性能。最后,我们推导出专门适用于两阶段训练方法的快捷交叉验证公式。