While adversarial perturbation of images to attack deep image classification models pose serious security concerns in practice, this paper suggests a novel paradigm where the concept of image perturbation can benefit classification performance, which we call amicable aid. We show that by taking the opposite search direction of perturbation, an image can be modified to yield higher classification confidence and even a misclassified image can be made correctly classified. This can be also achieved with a large amount of perturbation by which the image is made unrecognizable by human eyes. The mechanism of the amicable aid is explained in the viewpoint of the underlying natural image manifold. Furthermore, we investigate the universal amicable aid, i.e., a fixed perturbation can be applied to multiple images to improve their classification results. While it is challenging to find such perturbations, we show that making the decision boundary as perpendicular to the image manifold as possible via training with modified data is effective to obtain a model for which universal amicable perturbations are more easily found.
翻译:尽管针对攻击深度图像分类模型的对抗性扰动在现实中引发了严重的安全隐患,本文提出了一种全新范式——称为"友好辅助",即图像扰动概念可被用于提升分类性能。我们证明,通过沿扰动搜索的相反方向,可修改图像以获得更高的分类置信度,甚至可使原本被误分类的图像得到正确分类。当扰动幅度足够大时,即使图像在人眼看来已无法辨识,该效果仍能实现。我们从自然图像流形的视角解释了友好辅助的内在机制。此外,我们进一步研究了通用友好辅助,即通过固定扰动同时提升多张图像的分类结果。尽管寻找此类扰动极具挑战性,我们证明通过利用修改后的数据训练模型,使决策边界尽可能垂直于图像流形,可有效获得更易发现通用友好扰动的模型。