Deep Neural Networks have been shown to be vulnerable to adversarial images. Conventional attacks strive for indistinguishable adversarial images with strictly restricted perturbations. Recently, researchers have moved to explore distinguishable yet non-suspicious adversarial images and demonstrated that color transformation attacks are effective. In this work, we propose Adversarial Color Filter (AdvCF), a novel color transformation attack that is optimized with gradient information in the parameter space of a simple color filter. In particular, our color filter space is explicitly specified so that we are able to provide a systematic analysis of model robustness against adversarial color transformations, from both the attack and defense perspectives. In contrast, existing color transformation attacks do not offer the opportunity for systematic analysis due to the lack of such an explicit space. We further demonstrate the effectiveness of our AdvCF in fooling image classifiers and also compare it with other color transformation attacks regarding their robustness to defenses and image acceptability through an extensive user study. We also highlight the human-interpretability of AdvCF and show its superiority over the state-of-the-art human-interpretable color transformation attack on both image acceptability and efficiency. Additional results provide interesting new insights into model robustness against AdvCF in another three visual tasks.
翻译:深度神经网络已被证明易受对抗性图像攻击。传统攻击致力于在严格受限扰动下生成难以察觉的对抗性图像。近期研究者转而探索可感知但无可疑性的对抗性图像,并证实颜色变换攻击具有有效性。本文提出对抗性颜色滤波器(AdvCF),一种在简单颜色滤波器参数空间中通过梯度信息优化的新型颜色变换攻击方法。具体而言,我们显式定义了颜色滤波器参数空间,从而能够从攻击与防御两个视角系统分析模型对对抗性颜色变换的鲁棒性。相比之下,现有颜色变换攻击因缺乏此类显式空间而无法实现系统性分析。我们进一步通过大规模用户研究验证了AdvCF在欺骗图像分类器方面的有效性,并与其他颜色变换攻击在防御鲁棒性和图像可接受性方面进行了比较。研究还强调了AdvCF的可解释性优势,并在图像可接受性与效率两个维度上证明其优于现有最先进的可解释性颜色变换攻击。额外实验为模型在另外三种视觉任务中对抗AdvCF的鲁棒性提供了有趣的新见解。