The increasing integration of Visual Language Models (VLMs) into AI systems necessitates robust model alignment, especially when handling multimodal content that combines text and images. Existing evaluation datasets heavily lean towards text-only prompts, leaving visual vulnerabilities under evaluated. To address this gap, we propose \textbf{Text2VLM}, a novel multi-stage pipeline that adapts text-only datasets into multimodal formats, specifically designed to evaluate the resilience of VLMs against typographic prompt injection attacks. The Text2VLM pipeline identifies harmful content in the original text and converts it into a typographic image, creating a multimodal prompt for VLMs. Also, our evaluation of open-source VLMs highlights their increased susceptibility to prompt injection when visual inputs are introduced, revealing critical weaknesses in the current models' alignment. This is in addition to a significant performance gap compared to closed-source frontier models. We validate Text2VLM through human evaluations, ensuring the alignment of extracted salient concepts; text summarization and output classification align with human expectations. Text2VLM provides a scalable tool for comprehensive safety assessment, contributing to the development of more robust safety mechanisms for VLMs. By enhancing the evaluation of multimodal vulnerabilities, Text2VLM plays a role in advancing the safe deployment of VLMs in diverse, real-world applications.
翻译:随着视觉语言模型(VLM)日益融入人工智能系统,确保模型的对齐性变得至关重要,尤其是在处理结合文本与图像的多模态内容时。现有的评估数据集严重偏向于纯文本提示,导致对视觉漏洞的评估不足。为弥补这一不足,我们提出了 \textbf{Text2VLM},一种新颖的多阶段流程,可将纯文本数据集适配为多模态格式,专门用于评估 VLM 抵抗排版提示注入攻击的鲁棒性。Text2VLM 流程首先识别原始文本中的有害内容,并将其转换为排版图像,从而为 VLM 创建多模态提示。此外,我们对开源 VLM 的评估表明,在引入视觉输入后,它们对提示注入的敏感性显著增加,揭示了当前模型对齐机制中的关键弱点。与此同时,与闭源前沿模型相比,这些开源模型还存在显著的性能差距。我们通过人工评估验证了 Text2VLM 的有效性,确保其提取的关键概念与人类预期一致;文本摘要和输出分类均符合人类期望。Text2VLM 为全面的安全性评估提供了一个可扩展的工具,有助于为 VLM 开发更鲁棒的安全机制。通过加强对多模态漏洞的评估,Text2VLM 在推动 VLM 在各种现实世界应用中安全部署方面发挥着重要作用。