Adversarial training has been proposed to hedge against adversarial attacks in machine learning and statistical models. This paper focuses on adversarial training under $\ell_\infty$-perturbation, which has recently attracted much research attention. The asymptotic behavior of the adversarial training estimator is investigated in the generalized linear model. The results imply that the limiting distribution of the adversarial training estimator under $\ell_\infty$-perturbation could put a positive probability mass at $0$ when the true parameter is $0$, providing a theoretical guarantee of the associated sparsity-recovery ability. Alternatively, a two-step procedure is proposed -- adaptive adversarial training, which could further improve the performance of adversarial training under $\ell_\infty$-perturbation. Specifically, the proposed procedure could achieve asymptotic unbiasedness and variable-selection consistency. Numerical experiments are conducted to show the sparsity-recovery ability of adversarial training under $\ell_\infty$-perturbation and to compare the empirical performance between classic adversarial training and adaptive adversarial training.
翻译:对抗训练被提出用于防范机器学习和统计模型中的对抗攻击。本文聚焦于近期备受关注的$\ell_\infty$扰动下的对抗训练问题。在广义线性模型框架下,研究了对抗训练估计量的渐近行为。研究结果表明,当真实参数为零时,$\ell_\infty$扰动下对抗训练估计量的极限分布可能在零点处具有正概率质量,这为相关稀疏恢复能力提供了理论保障。此外,本文提出了一种两步法——自适应对抗训练,可进一步提升$\ell_\infty$扰动下对抗训练的性能。具体而言,所提方法能够实现渐近无偏性和变量选择相合性。通过数值实验验证了$\ell_\infty$扰动下对抗训练的稀疏恢复能力,并比较了经典对抗训练与自适应对抗训练的实证表现。