Recently, significant progress has been made on Large Vision-Language Models (LVLMs); a new class of VL models that make use of large pre-trained language models. Yet, their vulnerability to Typographic attacks, which involve superimposing misleading text onto an image remain unstudied. Furthermore, prior work typographic attacks rely on sampling a random misleading class from a predefined set of classes. However, the random chosen class might not be the most effective attack. To address these issues, we first introduce a novel benchmark uniquely designed to test LVLMs vulnerability to typographic attacks. Furthermore, we introduce a new and more effective typographic attack: Self-Generated typographic attacks. Indeed, our method, given an image, make use of the strong language capabilities of models like GPT-4V by simply prompting them to recommend a typographic attack. Using our novel benchmark, we uncover that typographic attacks represent a significant threat against LVLM(s). Furthermore, we uncover that typographic attacks recommended by GPT-4V using our new method are not only more effective against GPT-4V itself compared to prior work attacks, but also against a host of less capable yet popular open source models like LLaVA, InstructBLIP, and MiniGPT4.
翻译:近期,大型视觉-语言模型(LVLMs)——一类利用大型预训练语言模型的新型视觉语言模型——取得了显著进展。然而,它们对字体攻击(即把误导性文本叠加到图像上的攻击方式)的脆弱性仍未得到充分研究。此外,现有字体攻击方法依赖于从预定义类别集合中随机采样一个误导类,但随机选择的类别可能并非最有效的攻击。为解决这些问题,本文首先提出了一个专门用于测试LVLMs对字体攻击脆弱性的全新基准。其次,我们引入了一种更有效的字体攻击方法:自生成字体攻击。该方法针对给定图像,通过简单提示GPT-4V等模型推荐字体攻击方式,充分利用其强大的语言能力。利用新基准,我们发现字体攻击对LVLMs构成显著威胁。进一步研究表明,通过新方法由GPT-4V推荐的字体攻击不仅对GPT-4V本身比现有攻击方法更有效,还能有效攻击LLaVA、InstructBLIP和MiniGPT4等能力较弱但流行的开源模型。