思维锚点：哪些LLM推理步骤至关重要？ (Thought Anchors: Which LLM Reasoning Steps Matter?)

from arxiv, Paul C. Bogdan and Uzay Macar contributed equally to this work, and their listed order was determined by coinflip. Neel Nanda and Arthur Conmy contributed equally to this work as senior authors, and their listed order was determined by coinflip

Current frontier large-language models rely on reasoning to achieve state-of-the-art performance. Many existing interpretability are limited in this area, as standard methods have been designed to study single forward passes of a model rather than the multi-token computational steps that unfold during reasoning. We argue that analyzing reasoning traces at the sentence level is a promising approach to understanding reasoning processes. We introduce a black-box method that measures each sentence's counterfactual importance by repeatedly sampling replacement sentences from the model, filtering for semantically different ones, and continuing the chain of thought from that point onwards to quantify the sentence's impact on the distribution of final answers. We discover that certain sentences can have an outsized impact on the trajectory of the reasoning trace and final answer. We term these sentences \textit{thought anchors}. These are generally planning or uncertainty management sentences, and specialized attention heads consistently attend from subsequent sentences to thought anchors. We further show that examining sentence-sentence causal links within a reasoning trace gives insight into a model's behavior. Such information can be used to predict a problem's difficulty and the extent different question domains involve sequential or diffuse reasoning. As a proof-of-concept, we demonstrate that our techniques together provide a practical toolkit for analyzing reasoning models by conducting a detailed case study of how the model solves a difficult math problem, finding that our techniques yield a consistent picture of the reasoning trace's structure. We provide an open-source tool (thought-anchors.com) for visualizing the outputs of our methods on further problems. The convergence across our methods shows the potential of sentence-level analysis for a deeper understanding of reasoning models.

翻译：当前前沿的大语言模型依赖推理来实现最先进的性能。许多现有的可解释性方法在此领域存在局限，因为标准方法旨在研究模型的单次前向传播，而非推理过程中展开的多标记计算步骤。我们认为在句子层面分析推理轨迹是理解推理过程的一种有前景的方法。我们提出一种黑盒方法，通过重复从模型中采样替换句子、筛选语义不同的句子，并从该点继续思维链以量化句子对最终答案分布的影响，从而测量每个句子的反事实重要性。我们发现某些句子对推理轨迹和最终答案的走向具有超乎寻常的影响。我们将这些句子称为\textit{思维锚点}。这些通常是规划或不确定性管理句子，且专门的注意力头会持续从后续句子关注到思维锚点。我们进一步表明，检查推理轨迹中句子间的因果联系可以洞察模型的行为。此类信息可用于预测问题的难度以及不同问题领域涉及顺序推理或分散推理的程度。作为概念验证，我们通过详细案例研究展示我们的技术共同为分析推理模型提供了一个实用工具包，探究模型如何解决一个困难的数学问题，发现我们的技术对推理轨迹结构得出一致的描述。我们提供了一个开源工具（thought-anchors.com），用于在更多问题上可视化我们方法的输出结果。我们方法之间的收敛性表明，句子层面分析对于深入理解推理模型具有潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日