With an ever-increasing reliance on machine learning (ML) models in the real world, adversarial examples threaten the safety of AI-based systems such as autonomous vehicles. In the image domain, they represent maliciously perturbed data points that look benign to humans (i.e., the image modification is not noticeable) but greatly mislead state-of-the-art ML models. Previously, researchers ensured the imperceptibility of their altered data points by restricting perturbations via $\ell_p$ norms. However, recent publications claim that creating natural-looking adversarial examples without such restrictions is also possible. With much more freedom to instill malicious information into data, these unrestricted adversarial examples can potentially overcome traditional defense strategies as they are not constrained by the limitations or patterns these defenses typically recognize and mitigate. This allows attackers to operate outside of expected threat models. However, surveying existing image-based methods, we noticed a need for more human evaluations of the proposed image modifications. Based on existing human-assessment frameworks for image generation quality, we propose SCOOTER - an evaluation framework for unrestricted image-based attacks. It provides researchers with guidelines for conducting statistically significant human experiments, standardized questions, and a ready-to-use implementation. We propose a framework that allows researchers to analyze how imperceptible their unrestricted attacks truly are.
翻译:随着机器学习模型在现实世界中的依赖程度日益增加,对抗样本威胁着基于人工智能的系统(如自动驾驶汽车)的安全性。在图像领域,对抗样本是指经过恶意扰动的数据点,这些数据点对人类而言看似正常(即图像修改不易察觉),但能严重误导最先进的机器学习模型。以往,研究人员通过$\ell_p$范数限制扰动来确保其修改数据点的不易察觉性。然而,近期研究表明,在没有此类限制的情况下,生成自然外观的对抗样本也是可能的。这些无限制对抗样本因其在数据中注入恶意信息时拥有更大的自由度,有可能克服传统防御策略,因为它们不受这些防御通常识别和缓存的限制或模式约束。这使得攻击者能够在预期的威胁模型之外操作。然而,通过调查现有的基于图像的方法,我们注意到对提出的图像修改进行更多的人工评估的需求。基于现有的图像生成质量人工评估框架,我们提出了SCOOTER——一种针对无限制图像攻击的评估框架。它为研究人员提供了进行具有统计显著性的人工实验的指南、标准化的问题以及一个可直接使用的实现方案。该框架使研究人员能够分析其无限制攻击的不可察觉性真实程度。