What kind of internal mechanisms might Transformers use to conduct fluid, natural-sounding conversations? Prior work has illustrated by construction how Transformers can solve various synthetic tasks, such as sorting a list or recognizing formal languages, but it remains unclear how to extend this approach to a conversational setting. In this work, we propose using ELIZA, a classic rule-based chatbot, as a setting for formal, mechanistic analysis of Transformer-based chatbots. ELIZA allows us to formally model key aspects of conversation, including local pattern matching and long-term dialogue state tracking. We first present a theoretical construction of a Transformer that implements the ELIZA chatbot. Building on prior constructions, particularly those for simulating finite-state automata, we show how simpler mechanisms can be composed and extended to produce more sophisticated behavior. Next, we conduct a set of empirical analyses of Transformers trained on synthetically generated ELIZA conversations. Our analysis illustrates the kinds of mechanisms these models tend to prefer--for example, models favor an induction head mechanism over a more precise, position-based copying mechanism; and using intermediate generations to simulate recurrent data structures, akin to an implicit scratchpad or Chain-of-Thought. Overall, by drawing an explicit connection between neural chatbots and interpretable, symbolic mechanisms, our results provide a new framework for the mechanistic analysis of conversational agents.
翻译:Transformer模型可能采用何种内部机制来实现流畅自然的对话?先前的研究通过构造性方法展示了Transformer如何解决各种合成任务,例如列表排序或形式语言识别,但如何将这种方法扩展到对话场景仍不明确。本研究提出以经典规则驱动聊天机器人ELIZA作为形式化分析Transformer对话系统的机制解析场景。ELIZA使我们能够形式化建模对话的关键要素,包括局部模式匹配和长程对话状态追踪。我们首先从理论上构建了实现ELIZA聊天机器人的Transformer架构。基于先前特别是模拟有限状态自动机的构造方法,我们展示了如何通过组合与扩展简单机制来实现更复杂的行为。随后,我们对基于合成ELIZA对话训练的Transformer模型进行实证分析。分析揭示了这些模型倾向采用的机制类型——例如,相较于精确的基于位置的复制机制,模型更偏好归纳头机制;并通过中间生成过程模拟递归数据结构,类似于隐式草稿纸或思维链机制。总体而言,通过建立神经对话模型与可解释符号机制之间的显式关联,我们的研究结果为对话智能体的机制解析提供了新的理论框架。