Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model

Large Vision-Language Models (LVLMs) rely on vision encoders and Large Language Models (LLMs) to exhibit remarkable capabilities on various multi-modal tasks in the joint space of vision and language. However, the Typographic Attack, which disrupts vision-language models (VLMs) such as Contrastive Language-Image Pretraining (CLIP), has also been expected to be a security threat to LVLMs. Firstly, we verify typographic attacks on current well-known commercial and open-source LVLMs and uncover the widespread existence of this threat. Secondly, to better assess this vulnerability, we propose the most comprehensive and largest-scale Typographic Dataset to date. The Typographic Dataset not only considers the evaluation of typographic attacks under various multi-modal tasks but also evaluates the effects of typographic attacks, influenced by texts generated with diverse factors. Based on the evaluation results, we investigate the causes why typographic attacks may impact VLMs and LVLMs, leading to three highly insightful discoveries. By the examination of our discoveries and experimental validation in the Typographic Dataset, we reduce the performance degradation from $42.07\%$ to $13.90\%$ when LVLMs confront typographic attacks.

翻译：大型视觉-语言模型（LVLMs）依赖视觉编码器和大型语言模型（LLMs），在视觉与语言的联合空间中展现出处理多种多模态任务的卓越能力。然而，排版攻击（Typographic Attack）作为一种破坏视觉-语言模型（VLMs）（如对比语言-图像预训练模型CLIP）的手段，也被预期为LVLMs的安全威胁。首先，我们验证了当前知名商业和开源LVLMs上的排版攻击，揭示了这一威胁的广泛存在。其次，为更好地评估这一脆弱性，我们提出了迄今为止最全面、规模最大的排版数据集。该数据集不仅考虑多种多模态任务下排版攻击的评估，还评估了由不同因素生成的文本所影响的排版攻击效果。基于评估结果，我们探究了排版攻击可能影响VLMs和LVLMs的原因，得出了三项极具洞察力的发现。通过对这些发现的审视以及在排版数据集上的实验验证，我们将LVLMs面对排版攻击时的性能下降从42.07%降低至13.90%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日