Adversarial attacks have been a looming and unaddressed threat in the industry. However, through a decade-long history of the robustness evaluation literature, we have learned that mounting a strong or optimal attack is challenging. It requires both machine learning and domain expertise. In other words, the white-box threat model, religiously assumed by a large majority of the past literature, is unrealistic. In this paper, we propose a new practical threat model where the adversary relies on transfer attacks through publicly available surrogate models. We argue that this setting will become the most prevalent for security-sensitive applications in the future. We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective. The defenses are evaluated under 24 public models and 11 attack algorithms across three datasets (CIFAR-10, CIFAR-100, and ImageNet). Under this threat model, our defense, PubDef, outperforms the state-of-the-art white-box adversarial training by a large margin with almost no loss in the normal accuracy. For instance, on ImageNet, our defense achieves 62% accuracy under the strongest transfer attack vs only 36% of the best adversarially trained model. Its accuracy when not under attack is only 2% lower than that of an undefended model (78% vs 80%). We release our code at https://github.com/wagner-group/pubdef.
翻译:对抗性攻击一直是工业界悬而未决的威胁。然而,通过长达十年的鲁棒性评估文献研究,我们发现构建强攻击或最优攻击具有挑战性,这既需要机器学习知识,也需要领域专业知识。换言之,过去文献中绝大多数严格假设的白盒威胁模型并不现实。本文提出了一种新的实用威胁模型,其中攻击者依赖通过公开可用的替代模型发起的迁移攻击。我们认为,未来在安全敏感应用中,这一设定将最为普遍。我们评估了该设定下的迁移攻击,并提出了一种基于博弈论视角的专用防御方法。该防御方法在24个公开模型和11种攻击算法下,在三个数据集(CIFAR-10、CIFAR-100和ImageNet)上进行了评估。在该威胁模型下,我们的防御方法PubDef以几乎不损失正常精度的代价,大幅优于最先进的白盒对抗训练。例如,在ImageNet上,面对最强迁移攻击时,我们的防御方法实现了62%的准确率,而最佳对抗训练模型仅为36%;无攻击状态下其准确率仅比未防御模型低2%(78%对80%)。我们已在https://github.com/wagner-group/pubdef 发布代码。