Adversarial training is a standard technique for training adversarially robust models. In this paper, we study adversarial training as an alternating best-response strategy in a 2-player zero-sum game. We prove that even in a simple scenario of a linear classifier and a statistical model that abstracts robust vs. non-robust features, the alternating best response strategy of such game may not converge. On the other hand, a unique pure Nash equilibrium of the game exists and is provably robust. We support our theoretical results with experiments, showing the non-convergence of adversarial training and the robustness of Nash equilibrium.
翻译:对抗性训练是训练对抗鲁棒模型的标准技术。本文研究对抗训练作为两人零和博弈中交替最优反应策略的行为。我们证明,即使在线性分类器和抽象鲁棒性与非鲁棒性特征的统计模型这一简单场景下,该博弈的交替最优反应策略也可能不收敛。另一方面,该博弈存在唯一的纯策略纳什均衡,并且可证明具有鲁棒性。我们通过实验支持理论结果,展示了对抗训练的不收敛性及纳什均衡的鲁棒性。