Improving the Efficiency of Visually Augmented Language Models

Despite the impressive performance of autoregressive Language Models (LM) it has been shown that due to reporting bias, LMs lack visual knowledge, i.e. they do not know much about the visual world and its properties. To augment LMs with visual knowledge, existing solutions often rely on explicit images, requiring time-consuming retrieval or image generation systems. This paper shows that explicit images are not necessary to visually augment an LM. Instead, we use visually-grounded text representations obtained from the well-known CLIP multimodal system. For a fair comparison, we modify VALM, a visually-augmented LM which uses image retrieval and representation, to work directly with visually-grounded text representations. We name this new model BLIND-VALM. We show that BLIND-VALM performs on par with VALM for Visual Language Understanding (VLU), Natural Language Understanding (NLU) and Language Modeling tasks, despite being significantly more efficient and simpler. We also show that scaling up our model within the compute budget of VALM, either increasing the model or pre-training corpus size, we outperform VALM for all the evaluation tasks.

翻译：尽管自回归语言模型（LM）表现出令人印象深刻的性能，但研究表明，由于报告偏差的存在，语言模型缺乏视觉知识，即它们对视觉世界及其属性了解甚少。为了用视觉知识增强语言模型，现有解决方案通常依赖于显式图像，这需要耗时的检索或图像生成系统。本文证明，显式图像对于视觉增强语言模型并非必需。相反，我们使用从著名的CLIP多模态系统中获得的视觉接地文本表示。为了进行公平比较，我们修改了VALM（一种使用图像检索和表示的视觉增强语言模型），使其能直接处理视觉接地文本表示。我们将这一新模型命名为BLIND-VALM。我们证明，尽管BLIND-VALM在效率上显著更高且结构更简单，但在视觉语言理解（VLU）、自然语言理解（NLU）和语言建模任务上的表现与VALM相当。我们还证明，在VALM的计算预算内扩展我们的模型（无论是增加模型规模还是预训练语料库大小），我们在所有评估任务上的表现均优于VALM。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日