This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness. For the OOD evaluation, we present two novel VQA datasets, each with one variant, designed to test model performance under challenging conditions. In exploring adversarial robustness, we propose a straightforward attack strategy for misleading VLLMs to produce visual-unrelated responses. Moreover, we assess the efficacy of two jailbreaking strategies, targeting either the vision or language component of VLLMs. Our evaluation of 21 diverse models, ranging from open-source VLLMs to GPT-4V, yields interesting observations: 1) Current VLLMs struggle with OOD texts but not images, unless the visual information is limited; and 2) These VLLMs can be easily misled by deceiving vision encoders only, and their vision-language training often compromise safety protocols. We release this safety evaluation suite at https://github.com/UCSC-VLAA/vllm-safety-benchmark.
翻译:本工作聚焦于视觉大语言模型在视觉推理中的潜力。与先前研究不同,我们将评估重点从标准性能评测转向构建全面的安全评估套件,涵盖分布外泛化与对抗鲁棒性两大维度。针对分布外评估,我们提出两个新型VQA数据集(各含一种变体),旨在测试模型在挑战性条件下的表现。在对抗鲁棒性探究中,我们设计了一种简单直接的攻击策略,可误导视觉大语言模型生成与视觉无关的响应。此外,我们评估了两种针对视觉或语言组件的越狱策略的有效性。通过对从开源视觉大语言模型到GPT-4V的21个差异化模型进行评测,我们获得了有趣的发现:1)当前视觉大语言模型难以处理分布外文本,但除非视觉信息受限,否则对分布外图像表现尚可;2)这类模型易被仅攻击视觉编码器的方式误导,且其视觉-语言训练常会削弱安全机制。我们将此安全评估套件发布在https://github.com/UCSC-VLAA/vllm-safety-benchmark。