In this paper, we investigate the interplay between attention heads and specialized "next-token" neurons in the Multilayer Perceptron that predict specific tokens. By prompting an LLM like GPT-4 to explain these model internals, we can elucidate attention mechanisms that activate certain next-token neurons. Our analysis identifies attention heads that recognize contexts relevant to predicting a particular token, activating the associated neuron through the residual connection. We focus specifically on heads in earlier layers consistently activating the same next-token neuron across similar prompts. Exploring these differential activation patterns reveals that heads that specialize for distinct linguistic contexts are tied to generating certain tokens. Overall, our method combines neural explanations and probing isolated components to illuminate how attention enables context-dependent, specialized processing in LLMs.
翻译:本文探究了注意力头与多层感知机中预测特定词元的专用“下一词元”神经元之间的相互作用。通过引导GPT-4等大语言模型解释这些模型内部机制,我们能够阐明激活特定下一词元神经元的注意力机制。我们的分析识别出:能够识别与预测特定词元相关上下文的注意力头,并通过残差连接激活对应神经元。我们特别关注早期层中在相似提示词下持续激活相同下一词元神经元的注意力头。探索这些差异化激活模式表明:专精于不同语言上下文的注意力头与生成特定词元相关联。总体而言,我们的方法结合了神经解释与隔离组件探测,阐明了注意力机制如何支持大语言模型实现上下文依赖的专用化信息处理。