The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded Lp-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety of attacks. In this paper, we present the first unified framework for considering multiple attacks against ML models. Our framework is able to model different levels of learner's knowledge about the test-time adversary, allowing us to model robustness against unforeseen attacks and robustness against unions of attacks. Using our framework, we present the first leaderboard, MultiRobustBench, for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including Lp-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total). Additionally, we analyze the state of current defenses against multiple attacks. Our analysis shows that while existing defenses have made progress in terms of average robustness across the set of attacks used, robustness against the worst-case attack is still a big open problem as all existing models perform worse than random guessing.
翻译:现有对抗样本防御研究主要集中于防御单一(通常为有界Lp范数)攻击,但在实际场景中,机器学习模型需对多种攻击具有鲁棒性。本文首次提出考虑多攻击场景的统一框架,该框架可建模学习者在测试时对不同攻击知识水平的认知,从而实现对未预见攻击和攻击联合的鲁棒性建模。基于该框架,我们构建了首个多攻击评估排行榜MultiRobustBench,可捕获不同攻击类型与攻击强度下的性能表现。我们评估了16个防御模型在9类攻击类型(包括基于Lp的威胁模型、空间变换和颜色变换)及20种攻击强度(共计180种攻击)下的鲁棒性。此外,我们分析了当前多攻击防御技术的现状。分析表明:现有防御虽在攻击集合的平均鲁棒性上取得进展,但针对最坏情况攻击的鲁棒性仍是重大开放问题——所有现有模型的表现均差于随机猜测。