Evidence of Meaning in Language Models Trained on Programs

We present evidence that language models can learn meaning despite being trained only to perform next token prediction on text, specifically a corpus of programs. Each program is preceded by a specification in the form of (textual) input-output examples. Working with programs enables us to precisely define concepts relevant to meaning in language (e.g., correctness and semantics), making program synthesis well-suited as an intermediate testbed for characterizing the presence (or absence) of meaning in language models. We first train a Transformer model on the corpus of programs, then probe the trained model's hidden states as it completes a program given a specification. Despite providing no inductive bias toward learning the semantics of the language, we find that a linear probe is able to extract abstractions of both current and future program states from the model states. Moreover, there is a strong, statistically significant correlation between the accuracy of the probe and the model's ability to generate a program that implements the specification. To evaluate whether the semantics are represented in the model states rather than learned by the probe, we design a novel experimental procedure that intervenes on the semantics of the language while preserving the lexicon and syntax. We also demonstrate that the model learns to generate correct programs that are, on average, shorter than those in the training set, which is evidence that language model outputs may differ from the training distribution in semantically meaningful ways. In summary, this paper does not propose any new techniques for training language models, but develops an experimental framework for and provides insights into the acquisition and representation of (formal) meaning in language models.

翻译：我们展示了语言模型在仅通过文本（特别是程序语料库）进行下一词元预测训练的情况下，仍能学习意义的证据。每个程序前都附有（文本形式的）输入-输出示例形式的规范。通过程序研究使我们能够精确定义与语言意义相关的概念（如正确性和语义），使得程序合成成为表征语言模型中意义存在（或缺失）的理想中间测试平台。我们首先在程序语料库上训练Transformer模型，然后探查训练后模型在给定规范下完成程序时的隐藏状态。尽管未提供任何引导模型学习语言语义的归纳偏置，我们发现线性探针能够从模型状态中提取当前和未来程序状态的抽象表征。此外，探针精度与模型生成实现规范的程序能力之间存在强且统计显著的相关性。为评估语义是由模型状态表征还是由探针学习所得，我们设计了一种新颖的实验流程，在保持词汇和语法不变的情况下干预语言语义。我们还证明，模型学习生成的平均长度短于训练集的正确程序，这揭示了语言模型输出可能在语义上有意义的方式偏离训练分布。总之，本文并未提出任何训练语言模型的新技术，而是构建了实验框架并提供了关于语言模型中（形式化）意义获取与表征的见解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/