Deep neural networks for image classification remain vulnerable to adversarial examples -- small, imperceptible perturbations that induce misclassifications. In black-box settings, where only the final prediction is accessible, crafting targeted attacks that aim to misclassify into a specific target class is particularly challenging due to narrow decision regions. Current state-of-the-art methods often exploit the geometric properties of the decision boundary separating a source image and a target image rather than incorporating information from the images themselves. In contrast, we propose Targeted Edge-informed Attack (TEA), a novel attack that utilizes edge information from the target image to carefully perturb it, thereby producing an adversarial image that is closer to the source image while still achieving the desired target classification. Our approach consistently outperforms current state-of-the-art methods across different models in low query settings (nearly 70% fewer queries are used), a scenario especially relevant in real-world applications with limited queries and black-box access. Furthermore, by efficiently generating a suitable adversarial example, TEA provides an improved target initialization for established geometry-based attacks.
翻译:用于图像分类的深度神经网络仍然容易受到对抗样本的攻击——这些微小且难以察觉的扰动会导致错误分类。在黑盒设置中,仅能获取最终预测结果,由于决策区域狭窄,旨在将图像误分类至特定目标类别的目标攻击尤其具有挑战性。当前最先进的方法通常利用分隔源图像与目标图像的决策边界的几何特性,而非整合图像本身的信息。相比之下,我们提出了目标边缘信息攻击(TEA),这是一种新颖的攻击方法,它利用目标图像的边缘信息对其进行精细扰动,从而生成一个更接近源图像、同时仍能实现预期目标分类的对抗图像。在低查询设置下(查询次数减少近70%),我们的方法在不同模型上始终优于当前最先进的方法,这种场景在现实世界查询有限且仅能黑盒访问的应用中尤其相关。此外,通过高效生成合适的对抗样本,TEA为已建立的基于几何的攻击提供了改进的目标初始化。