Recent advances in deep learning research have shown remarkable achievements across many tasks in computer vision (CV) and natural language processing (NLP). At the intersection of CV and NLP is the problem of image captioning, where the related models' robustness against adversarial attacks has not been well studied. In this paper, we present a novel adversarial attack strategy, which we call AICAttack (Attention-based Image Captioning Attack), designed to attack image captioning models through subtle perturbations on images. Operating within a black-box attack scenario, our algorithm requires no access to the target model's architecture, parameters, or gradient information. We introduce an attention-based candidate selection mechanism that identifies the optimal pixels to attack, followed by Differential Evolution (DE) for perturbing pixels' RGB values. We demonstrate AICAttack's effectiveness through extensive experiments on benchmark datasets with multiple victim models. The experimental results demonstrate that our method surpasses current leading-edge techniques by effectively distributing the alignment and semantics of words in the output.
翻译:近年来,深度学习研究在计算机视觉(CV)和自然语言处理(NLP)的诸多任务中取得了显著成就。在CV与NLP的交叉领域——图像描述任务中,相关模型对对抗攻击的鲁棒性尚未得到充分研究。本文提出一种新颖的对抗攻击策略,称为AICAttack(基于注意力的图像描述攻击),旨在通过对图像施加细微扰动来攻击图像描述模型。该算法在黑盒攻击场景下运行,无需访问目标模型的架构、参数或梯度信息。我们引入了一种基于注意力的候选像素选择机制,用于识别最优攻击像素,随后采用差分进化算法(DE)扰动这些像素的RGB值。通过在多个受害者模型的标准数据集上进行大量实验,我们证明了AICAttack的有效性。实验结果表明,本方法通过有效破坏输出文本中词语的对齐与语义,在性能上超越了当前前沿技术。