Adversarial attacks have been a looming and unaddressed threat in the industry. However, through a decade-long history of the robustness evaluation literature, we have learned that mounting a strong or optimal attack is challenging. It requires both machine learning and domain expertise. In other words, the white-box threat model, religiously assumed by a large majority of the past literature, is unrealistic. In this paper, we propose a new practical threat model where the adversary relies on transfer attacks through publicly available surrogate models. We argue that this setting will become the most prevalent for security-sensitive applications in the future. We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective. The defenses are evaluated under 24 public models and 11 attack algorithms across three datasets (CIFAR-10, CIFAR-100, and ImageNet). Under this threat model, our defense, PubDef, outperforms the state-of-the-art white-box adversarial training by a large margin with almost no loss in the normal accuracy. For instance, on ImageNet, our defense achieves 62% accuracy under the strongest transfer attack vs only 36% of the best adversarially trained model. Its accuracy when not under attack is only 2% lower than that of an undefended model (78% vs 80%). We release our code at https://github.com/wagner-group/pubdef.
翻译:对抗性攻击一直是行业中悬而未决且未得到充分应对的威胁。然而,通过鲁棒性评估文献十余年的历史,我们认识到构建强有力或最优攻击极具挑战性,这同时需要机器学习和领域专业知识。换言之,过去绝大多数文献严格假定的白盒威胁模型并不现实。本文提出了一种新的实用威胁模型,其中攻击者依赖通过公开可用替代模型发起的迁移攻击。我们认为,这一设定将成为未来安全敏感应用中最常见的场景。我们评估了该设定下的迁移攻击,并基于博弈论视角提出了一种专门的防御方法。该防御方法在三个数据集(CIFAR-10、CIFAR-100和ImageNet)上,针对24个公共模型和11种攻击算法进行了评估。在此威胁模型下,我们的防御方法PubDef 以几乎无损的正常准确率,大幅超越当前最优的白盒对抗训练方法。例如,在ImageNet上,我们的防御在最强的迁移攻击下达到62%的准确率,而最佳对抗训练模型仅为36%。其无攻击时的准确率仅比未防御模型低2%(78%对比80%)。我们将代码开源在https://github.com/wagner-group/pubdef。