In guaranteeing the absence of adversarial examples in an instance's neighbourhood, certification mechanisms play an important role in demonstrating neural net robustness. In this paper, we ask if these certifications can compromise the very models they help to protect? Our new \emph{Certification Aware Attack} exploits certifications to produce computationally efficient norm-minimising adversarial examples $74 \%$ more often than comparable attacks, while reducing the median perturbation norm by more than $10\%$. While these attacks can be used to assess the tightness of certification bounds, they also highlight an apparent paradox -- that certifications can reduce security.
翻译:摘要:在保证实例邻域内不存在对抗样本方面,认证机制在证明神经网络鲁棒性中扮演着重要角色。本文探究这些认证是否会损害其本应保护的模型?我们提出的新型《认证感知攻击》利用认证机制,以比同类攻击高出74%的频率生成计算高效的范数最小化对抗样本,同时将中位数扰动范数降低超过10%。尽管此类攻击可用于评估认证边界的紧致性,但它们也凸显了一个明显的悖论——认证可能削弱安全性。