We present evidence that language models (LMs) of code can learn to represent the formal semantics of programs, despite being trained only to perform next-token prediction. Specifically, we train a Transformer model on a synthetic corpus of programs written in a domain-specific language for navigating 2D grid world environments. Each program in the corpus is preceded by a (partial) specification in the form of several input-output grid world states. Despite providing no further inductive biases, we find that a probing classifier is able to extract increasingly accurate representations of the unobserved, intermediate grid world states from the LM hidden states over the course of training, suggesting the LM acquires an emergent ability to interpret programs in the formal sense. We also develop a novel interventional baseline that enables us to disambiguate what is represented by the LM as opposed to learned by the probe. We anticipate that this technique may be generally applicable to a broad range of semantic probing experiments. In summary, this paper does not propose any new techniques for training LMs of code, but develops an experimental framework for and provides insights into the acquisition and representation of formal semantics in statistical models of code. Our code is available at https://github.com/charlesjin/emergent-semantics.
翻译:我们提供证据表明,尽管仅通过执行下一词预测进行训练,代码语言模型(LMs)能够学习表示程序的正式语义。具体而言,我们在一个合成语料库上训练了一个Transformer模型,该语料库包含使用领域特定语言编写的程序,用于在二维网格世界环境中导航。语料库中的每个程序都前接一个(部分)规范,形式为若干输入-输出网格世界状态。尽管未提供进一步的归纳偏置,我们发现探测分类器能够从训练过程中LM隐藏状态中提取出未观测到的中间网格世界状态越来越精确的表示,这表明LM获得了在正式意义上解释程序的涌现能力。我们还开发了一种新颖的干预基线,使我们能够区分LM所表示的内容与探测器所学内容。我们预期该技术可广泛应用于广泛的语义探测实验。总之,本文并未提出任何训练代码LM的新技术,而是开发了一个实验框架,并为统计代码模型中正式语义的获取与表示提供了见解。我们的代码可在https://github.com/charlesjin/emergent-semantics获取。