VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations

Despite their remarkable successes, state-of-the-art language models face challenges in grasping certain important semantic details. This paper introduces the VISLA (Variance and Invariance to Semantic and Lexical Alterations) benchmark, designed to evaluate the semantic and lexical understanding of language models. VISLA presents a 3-way semantic (in)equivalence task with a triplet of sentences associated with an image, to evaluate both vision-language models (VLMs) and unimodal language models (ULMs). An evaluation involving 34 VLMs and 20 ULMs reveals surprising difficulties in distinguishing between lexical and semantic variations. Spatial semantics encoded by language models also appear to be highly sensitive to lexical information. Notably, text encoders of VLMs demonstrate greater sensitivity to semantic and lexical variations than unimodal text encoders. Our contributions include the unification of image-to-text and text-to-text retrieval tasks, an off-the-shelf evaluation without fine-tuning, and assessing LMs' semantic (in)variance in the presence of lexical alterations. The results highlight strengths and weaknesses across diverse vision and unimodal language models, contributing to a deeper understanding of their capabilities. % VISLA enables a rigorous evaluation, shedding light on language models' capabilities in handling semantic and lexical nuances. Data and code will be made available at https://github.com/Sri-Harsha/visla_benchmark.

翻译：尽管当前最先进的语言模型取得了显著成功，但在把握某些关键语义细节方面仍面临挑战。本文提出VISLA（语义与词汇变换不变性与敏感性）基准测试，旨在评估语言模型的语义与词汇理解能力。该基准通过设计基于图像关联的三元语句对，构建了三路语义（不）等价任务，可用于评估视觉语言模型（VLM）和单模态语言模型（ULM）。对34个VLM和20个ULM的评估显示，这些模型在区分词汇变化与语义变化时存在显著困难。此外，语言模型编码的空间语义对词汇信息表现出高度敏感性。值得注意的是，VLM的文本编码器比单模态文本编码器对语义和词汇变化更敏感。本研究的贡献包括：统一图文检索与文本检索任务、无需微调的即用型评估、以及评估语言模型在词汇变化下的语义（不）变性。实验结果揭示了不同视觉语言模型与单模态语言模型的优势与不足，有助于深入理解其能力边界。% VISLA实现了严谨的评估，揭示了语言模型处理语义与词汇细微差别的能力。数据集与代码将在https://github.com/Sri-Harsha/visla_benchmark 公开。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/