How and where does CLIP process negation?

Various benchmarks have been proposed to test linguistic understanding in pre-trained vision \& language (VL) models. Here we build on the existence task from the VALSE benchmark (Parcalabescu et al, 2022) which we use to test models' understanding of negation, a particularly interesting issue for multimodal models. However, while such VL benchmarks are useful for measuring model performance, they do not reveal anything about the internal processes through which these models arrive at their outputs in such visio-linguistic tasks. We take inspiration from the growing literature on model interpretability to explain the behaviour of VL models on the understanding of negation. Specifically, we approach these questions through an in-depth analysis of the text encoder in CLIP (Radford et al, 2021), a highly influential VL model. We localise parts of the encoder that process negation and analyse the role of attention heads in this task. Our contributions are threefold. We demonstrate how methods from the language model interpretability literature (such as causal tracing) can be translated to multimodal models and tasks; we provide concrete insights into how CLIP processes negation on the VALSE existence task; and we highlight inherent limitations in the VALSE dataset as a benchmark for linguistic understanding.

翻译：已有多种基准测试被提出，用于评估预训练视觉与语言（VL）模型的语言理解能力。本文基于VALSE基准测试（Parcalabescu等人，2022）中的存在性任务，用于检验模型对否定的理解——这对多模态模型而言是一个特别值得关注的问题。然而，尽管此类VL基准测试有助于衡量模型性能，它们并未揭示这些模型在执行此类视觉-语言任务时得出输出的内部处理过程。我们借鉴了模型可解释性领域日益增多的研究，以解释VL模型在理解否定时的行为。具体而言，我们通过对具有高度影响力的VL模型CLIP（Radford等人，2021）中的文本编码器进行深入分析来探讨这些问题。我们定位了编码器中处理否定的部分，并分析了注意力头在此任务中的作用。我们的贡献包括三个方面：展示了如何将语言模型可解释性研究中的方法（如因果追踪）迁移到多模态模型与任务中；提供了关于CLIP如何在VALSE存在性任务中处理否定的具体见解；并指出了VALSE数据集作为语言理解基准测试的内在局限性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日