Typographic Attacks, which involve pasting misleading text onto an image, were noted to harm the performance of Vision-Language Models like CLIP. However, the susceptibility of recent Large Vision-Language Models to these attacks remains understudied. Furthermore, prior work's Typographic attacks against CLIP randomly sample a misleading class from a predefined set of categories. However, this simple strategy misses more effective attacks that exploit LVLM(s) stronger language skills. To address these issues, we first introduce a benchmark for testing Typographic attacks against LVLM(s). Moreover, we introduce two novel and more effective \textit{Self-Generated} attacks which prompt the LVLM to generate an attack against itself: 1) Class Based Attack where the LVLM (e.g. LLaVA) is asked which deceiving class is most similar to the target class and 2) Descriptive Attacks where a more advanced LVLM (e.g. GPT4-V) is asked to recommend a Typographic attack that includes both a deceiving class and description. Using our benchmark, we uncover that Self-Generated attacks pose a significant threat, reducing LVLM(s) classification performance by up to 33\%. We also uncover that attacks generated by one model (e.g. GPT-4V or LLaVA) are effective against the model itself and other models like InstructBLIP and MiniGPT4. Code: \url{https://github.com/mqraitem/Self-Gen-Typo-Attack}
翻译:排版攻击通过将误导性文本粘贴到图像上,已被证实会损害如CLIP等视觉语言模型的性能。然而,近期大型视觉语言模型对此类攻击的脆弱性仍未得到充分研究。此外,先前针对CLIP的排版攻击仅从预定义类别集中随机抽样误导性类别,这种简单策略未能充分利用大型视觉语言模型强大的语言能力来实施更有效的攻击。为解决上述问题,我们首先引入了一个用于测试大型视觉语言模型排版攻击的基准。同时,我们提出了两种新颖且更有效的"自生成"攻击——通过提示视觉语言模型生成针对自身的攻击:(1)基于类别的攻击,要求大型视觉语言模型(如LLaVA)判断哪个欺骗性类别与目标类别最相似;(2)描述性攻击,要求更高级的大型视觉语言模型(如GPT-4V)推荐包含欺骗性类别和描述的排版攻击。通过我们的基准测试,发现自生成攻击构成了重大威胁,可使大型视觉语言模型的分类性能降低高达33%。我们还发现由某个模型(如GPT-4V或LLaVA)生成的攻击不仅对该模型自身有效,也对InstructBLIP和MiniGPT4等其他模型有效。代码:\url{https://github.com/mqraitem/Self-Gen-Typo-Attack}