Adversarial attacks pose significant threats to deploying state-of-the-art classifiers in safety-critical applications. Two classes of methods have emerged to address this issue: empirical defences and certified defences. Although certified defences come with robustness guarantees, empirical defences such as adversarial training enjoy much higher popularity among practitioners. In this paper, we systematically compare the standard and robust error of these two robust training paradigms across multiple computer vision tasks. We show that in most tasks and for both $\mathscr{l}_\infty$-ball and $\mathscr{l}_2$-ball threat models, certified training with convex relaxations suffers from worse standard and robust error than adversarial training. We further explore how the error gap between certified and adversarial training depends on the threat model and the data distribution. In particular, besides the perturbation budget, we identify as important factors the shape of the perturbation set and the implicit margin of the data distribution. We support our arguments with extensive ablations on both synthetic and image datasets.
翻译:对抗攻击对安全关键应用中先进分类器的部署构成重大威胁。应对该问题主要涌现出两类方法:经验防御与认证防御。尽管认证防御具备鲁棒性保证,但诸如对抗训练之类的经验防御方法在实践者中更受青睐。本文系统比较了这两种鲁棒训练范式在多项计算机视觉任务中的标准误差与鲁棒误差。研究表明,在大多数任务中,对于$\mathscr{l}_\infty$-球和$\mathscr{l}_2$-球威胁模型,采用凸松弛的认证训练在标准误差和鲁棒误差方面均劣于对抗训练。我们进一步探究了认证训练与对抗训练间误差差距如何依赖于威胁模型与数据分布。特别地,除扰动预算外,我们还识别出扰动集形状和数据分布隐式间隔等关键影响因素。通过合成数据集和图像数据集上的广泛消融实验,为上述论点提供了支撑。