Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning

Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning. Recent work has addressed this problem by scaling self-consistency and parallel thinking, adding generic ``thinking tokens'' and prompting models to re-read the question before answering. Unfortunately, these approaches either inject task-agnostic tokens or mandate heuristics that do not explain -- and often ignore -- the \emph{spontaneous} repetition that many LRMs exhibit at the head of their internal chains. In contrast, we analyze and harness the model's tendency to restate the question, which we term the \emph{Echo of Prompt (EOP)}, as a front-loaded, compute-shaping mechanism. We formalize its probabilistic cost by casting echo removal as rejection-based conditioning and defining the \emph{Echo Likelihood Gap} $Δ\mathcal{L}$ as a computable proxy. This provides the missing theoretical link that links early repetition to likelihood gains and downstream accuracy. However, it does not by itself specify how to exploit EOP. Consequently, we develop \emph{Echo-Distilled SFT (ED-SFT)} to instill an ``echo-then-reason'' pattern through supervised finetuning, and \emph{Echoic Prompting (EP)} to re-ground the model mid-trace without training. While promising, quantifying benefits beyond verbosity is non-trivial. Therefore, we conduct length and suffix-controlled likelihood analyses together with layer-wise attention studies, showing that EOP increases answer to answer-prefix attention in middle layers, consistent with an \emph{attention refocusing} mechanism. We evaluate on GSM8K, MathQA, Hendrycks-MATH, AIME24, and MATH-500 under identical decoding settings and budgets, and find consistent gains over baselines. Code is available at https://github.com/hhh2210/echoes-as-anchors.

翻译：大型推理模型（LRMs）中的测试时计算分配已被广泛应用，并在数学问题求解、代码合成和规划等领域发挥作用。近期研究通过扩展自洽性与并行思维、添加通用“思考标记”以及提示模型在回答前重读问题来解决这一问题。然而，这些方法要么注入与任务无关的标记，要么强制采用无法解释——且常常忽略——许多LRMs在其内部推理链头部表现出的自发性重复的启发式策略。与此相反，我们分析并利用模型倾向于重述问题的特性（称之为提示回声），将其作为一种前载的计算塑形机制。我们通过将回声消除建模为基于拒绝的条件化过程，并定义可计算代理指标回声似然间隙$Δ\mathcal{L}$，从而形式化其概率成本。这为早期重复与似然增益及下游准确性之间的关联提供了缺失的理论链接。然而，该理论本身并未指明如何利用提示回声。因此，我们开发了回声蒸馏监督微调，通过监督微调注入“先回声后推理”的模式，以及回声提示法，在不进行训练的情况下实现推理过程中的模型重锚定。尽管前景可观，量化超越冗余性的收益并非易事。为此，我们进行了长度与后缀控制的似然分析以及分层注意力研究，结果表明提示回声能增强中间层中答案对答案前缀的注意力，这与注意力重聚焦机制相一致。我们在GSM8K、MathQA、Hendrycks-MATH、AIME24和MATH-500数据集上采用完全相同的解码设置与计算预算进行评估，发现该方法相较于基线模型取得了稳定提升。代码发布于https://github.com/hhh2210/echoes-as-anchors。