Adversarial robustness research primarily focuses on L_p perturbations, and most defenses are developed with identical training-time and test-time adversaries. However, in real-world applications developers are unlikely to have access to the full range of attacks or corruptions their system will face. Furthermore, worst-case inputs are likely to be diverse and need not be constrained to the L_p ball. To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a framework for evaluating model robustness against a range of unforeseen adversaries, including eighteen new non-L_p attacks. To perform well on ImageNet-UA, defenses must overcome a generalization gap and be robust to a diverse attacks not encountered during training. In extensive experiments, we find that existing robustness measures do not capture unforeseen robustness, that standard robustness techniques are beat by alternative training strategies, and that novel methods can improve unforeseen robustness. We present ImageNet-UA as a useful tool for the community for improving the worst-case behavior of machine learning systems.
翻译:对抗鲁棒性研究主要关注L_p扰动,且多数防御方法在训练和测试阶段使用相同的对抗样本。然而在实际应用中,开发者往往难以预知系统将面临的全部攻击类型或数据损坏形式。更关键的是,最坏情况下的输入具有多样性,未必局限于L_p范数球约束。为弥合研究现状与现实需求之间的差距,我们提出ImageNet-UA框架,用于评估模型面对包含18种新型非L_p攻击在内的广泛未知对抗样本的鲁棒性。要在ImageNet-UA上取得良好表现,防御方法必须克服泛化鸿沟,对训练中未遭遇的多样化攻击具备鲁棒性。大量实验表明,现有鲁棒性度量无法有效捕捉未知对抗攻击的防御能力,替代训练策略优于标准鲁棒性技术,而新颖方法能显著提升未知环境下的鲁棒性。我们将ImageNet-UA作为推动机器学习系统最坏情况性能提升的有效工具呈现给学界。