To correctly use in-context information, language models (LMs) must bind entities to their attributes. For example, given a context describing a "green square" and a "blue circle", LMs must bind the shapes to their respective colors. We analyze LM representations and identify the binding ID mechanism: a general mechanism for solving the binding problem, which we observe in every sufficiently large model from the Pythia and LLaMA families. Using causal interventions, we show that LMs' internal activations represent binding information by attaching binding ID vectors to corresponding entities and attributes. We further show that binding ID vectors form a continuous subspace, in which distances between binding ID vectors reflect their discernability. Overall, our results uncover interpretable strategies in LMs for representing symbolic knowledge in-context, providing a step towards understanding general in-context reasoning in large-scale LMs.
翻译:为了正确使用上下文信息,语言模型必须将实体与其属性进行绑定。例如,当给定一段描述“绿色正方形”和“蓝色圆形”的上下文时,语言模型必须将形状与其各自的颜色绑定。我们分析了语言模型的表示,并识别出绑定ID机制:一种解决绑定问题的通用机制,我们在Pythia和LLaMA系列的每个足够大的模型中均观察到这一机制。通过因果干预实验,我们证明语言模型的内部激活通过将绑定ID向量附加到相应的实体和属性上来表示绑定信息。我们进一步展示绑定ID向量形成一个连续子空间,其中向量之间的距离反映了它们的可区分性。总体而言,我们的研究结果揭示了语言模型在上下文中表示符号化知识的可解释策略,为理解大规模语言模型中的通用上下文推理提供了重要一步。