The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded Lp-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety of attacks. In this paper, we present the first unified framework for considering multiple attacks against ML models. Our framework is able to model different levels of learner's knowledge about the test-time adversary, allowing us to model robustness against unforeseen attacks and robustness against unions of attacks. Using our framework, we present the first leaderboard, MultiRobustBench, for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including Lp-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total). Additionally, we analyze the state of current defenses against multiple attacks. Our analysis shows that while existing defenses have made progress in terms of average robustness across the set of attacks used, robustness against the worst-case attack is still a big open problem as all existing models perform worse than random guessing.
翻译:现有对抗样本防御研究主要集中于抵御单一(通常为有界Lp范数)攻击,但在实际场景中,机器学习模型需对多种攻击类型具备鲁棒性。本文提出首个针对机器学习模型多重攻击的统一框架。该框架能够建模学习器对测试时攻击者不同层次的知识掌握程度,从而实现对未知攻击的鲁棒性以及对攻击集合的鲁棒性建模。基于该框架,我们发布了首个多攻击评估排行榜MultiRobustBench,该榜单可综合衡量模型在不同攻击类型与攻击强度下的表现。我们评估了16种防御模型在9种不同攻击类型(包括基于Lp范数的威胁模型、空间变换攻击、颜色变换攻击)以及20种不同攻击强度(共180种攻击)下的鲁棒性。此外,我们分析了当前多攻击防御技术的研究现状。分析表明,尽管现有防御在攻击集合的平均鲁棒性方面取得进展,但面对最坏情况攻击时,所有现有模型的鲁棒性均低于随机猜测水平,这仍是亟待解决的重要开放问题。