VLind-Bench: Measuring Language Priors in Large Vision-Language Models

Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with images that are out of training distribution. Despite its importance, current methods for accurately measuring language priors in LVLMs are poorly studied. Although existing benchmarks based on counterfactual or out-of-distribution images can partially be used to measure language priors, they fail to disentangle language priors from other confounding factors. To this end, we propose a new benchmark called VLind-Bench, which is the first benchmark specifically designed to measure the language priors, or blindness, of LVLMs. It not only includes tests on counterfactual images to assess language priors but also involves a series of tests to evaluate more basic capabilities such as commonsense knowledge, visual perception, and commonsense biases. For each instance in our benchmark, we ensure that all these basic tests are passed before evaluating the language priors, thereby minimizing the influence of other factors on the assessment. The evaluation and analysis of recent LVLMs in our benchmark reveal that almost all models exhibit a significant reliance on language priors, presenting a strong challenge in the field.

翻译：大型视觉语言模型（LVLM）在各种多模态任务中展现出了卓越的性能。然而，它们存在一个被称为“语言先验”的问题，即模型仅基于文本模式生成响应，而忽视了图像信息。解决语言先验问题至关重要，因为当处理超出训练分布的图像时，它可能导致不良的偏见或幻觉。尽管这一问题非常重要，但目前准确衡量LVLM中语言先验的方法研究尚不充分。尽管现有的基于反事实或分布外图像的基准可以部分用于衡量语言先验，但它们未能将语言先验与其他混杂因素分离开来。为此，我们提出了一个名为VLind-Bench的新基准，这是首个专门设计用于衡量LVLM语言先验（或“盲视”）的基准。它不仅包含针对反事实图像的测试以评估语言先验，还涉及一系列测试来评估更基础的能力，如常识知识、视觉感知和常识偏见。在我们的基准中，对于每个测试实例，我们确保所有这些基础测试均通过后再评估语言先验，从而最大限度地减少其他因素对评估的影响。通过对近期LVLM在我们基准上的评估与分析，我们发现几乎所有模型都表现出对语言先验的显著依赖，这给该领域带来了严峻挑战。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日