The transferability of adversarial examples is a crucial aspect of evaluating the robustness of deep learning systems, particularly in black-box scenarios. Although several methods have been proposed to enhance cross-model transferability, little attention has been paid to the transferability of adversarial examples across different tasks. This issue has become increasingly relevant with the emergence of foundational multi-task AI systems such as Visual ChatGPT, rendering the utility of adversarial samples generated by a single task relatively limited. Furthermore, these systems often entail inferential functions beyond mere recognition-like tasks. To address this gap, we propose a novel Visual Relation-based cross-task Adversarial Patch generation method called VRAP, which aims to evaluate the robustness of various visual tasks, especially those involving visual reasoning, such as Visual Question Answering and Image Captioning. VRAP employs scene graphs to combine object recognition-based deception with predicate-based relations elimination, thereby disrupting the visual reasoning information shared among inferential tasks. Our extensive experiments demonstrate that VRAP significantly surpasses previous methods in terms of black-box transferability across diverse visual reasoning tasks.
翻译:对抗样本的迁移性是评估深度学习系统鲁棒性的关键方面,尤其在黑盒场景下。尽管已有多种方法被提出以增强跨模型迁移性,但对抗样本在不同任务间的迁移性却鲜受关注。随着Visual ChatGPT等基础多任务AI系统的出现,这一问题愈发重要,使得单一任务生成的对抗样本效用相对有限。此外,这类系统通常蕴含超越纯识别类任务的推理功能。为填补这一空白,我们提出了一种名为VRAP的基于视觉关系的跨任务对抗补丁生成方法,旨在评估各类视觉任务的鲁棒性,特别是涉及视觉推理的任务(如视觉问答和图像描述)。VRAP利用场景图将基于物体识别的欺骗与基于谓词关系的消除相结合,从而破坏推理任务间共享的视觉推理信息。大量实验表明,VRAP在跨不同视觉推理任务的黑盒迁移性上显著优于先前方法。