The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

Modern language models can process inputs across diverse languages and modalities. We hypothesize that models acquire this capability through learning a shared representation space across heterogeneous data types (e.g., different languages and modalities), which places semantically similar inputs near one another, even if they are from different modalities/languages. We term this the semantic hub hypothesis, following the hub-and-spoke model from neuroscience (Patterson et al., 2007) which posits that semantic knowledge in the human brain is organized through a transmodal semantic "hub" which integrates information from various modality-specific "spokes" regions. We first show that model representations for semantically equivalent inputs in different languages are similar in the intermediate layers, and that this space can be interpreted using the model's dominant pretraining language via the logit lens. This tendency extends to other data types, including arithmetic expressions, code, and visual/audio inputs. Interventions in the shared representation space in one data type also predictably affect model outputs in other data types, suggesting that this shared representations space is not simply a vestigial byproduct of large-scale training on broad data, but something that is actively utilized by the model during input processing.

翻译：现代语言模型能够处理多种语言和模态的输入。我们提出假说：模型通过跨异构数据类型（例如不同语言与模态）学习共享表征空间来获得这种能力，该空间使语义相似的输入彼此邻近，即使它们来自不同模态或语言。我们将其称为语义枢纽假说，这借鉴了神经科学中的枢纽-辐条模型（Patterson等人，2007），该模型认为人脑中的语义知识通过一个跨模态语义“枢纽”进行组织，该枢纽整合来自各模态特异性“辐条”区域的信息。我们首先证明，对于不同语言中语义等价的输入，模型在中间层的表征是相似的，且该空间可通过logit lens借助模型的主导预训练语言进行解释。这种趋势延伸至其他数据类型，包括算术表达式、代码以及视觉/音频输入。在共享表征空间中对一种数据类型进行干预，也会可预测地影响模型在其他数据类型上的输出，这表明该共享表征空间并非仅仅是大规模宽泛数据训练产生的残留副产物，而是模型在处理输入时主动利用的机制。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日