Adversarial training is one of the most popular methods for training methods robust to adversarial attacks, however, it is not well-understood from a theoretical perspective. We prove and existence, regularity, and minimax theorems for adversarial surrogate risks. Our results explain some empirical observations on adversarial robustness from prior work and suggest new directions in algorithm development. Furthermore, our results extend previously known existence and minimax theorems for the adversarial classification risk to surrogate risks.
翻译:对抗训练是训练模型抵御对抗攻击最流行的方法之一,然而从理论角度其机制尚未被充分理解。我们证明了对抗性代理风险的存在性定理、正则性定理与极小极大定理。研究结果解释了先前工作中关于对抗鲁棒性的若干经验现象,并为算法开发提供了新方向。此外,我们的结果将对抗分类风险已知的存在性与极小极大定理推广至代理风险。