Can Transformers Learn to Verify During Backtracking Search?

Backtracking search underlies classical constraint solvers, planners, and theorem provers. Recent transformer-based reasoning systems explore search trees over their own intermediate steps. A common training recipe fits an autoregressive next-token loss on offline solver traces. The model's input at each step is a cumulative trace of all prior decisions. The optimal continue-or-backtrack predictor depends only on the current search state, since two trajectories reaching the same state admit the same viable continuations. We show that decoder-only transformers trained on cumulative traces fail this requirement in two ways: the trace can scatter state features across many positions (scattered retrieval), and the predictor can condition on the trajectory rather than the state (history entanglement). We address scattered retrieval with localization, a trace-level fix that rewrites each decision block to expose state features locally. We address history entanglement with Selective State Attention (SSA), a fixed attention mask that enforces state-based decisions structurally without modifying training data, objective, or parameters. We focus on reactive verification, after propagation has exposed a contradiction. We test SSA on 3-SAT, graph coloring, Blocks World, and backtracking parsing. On same-state pairs that differ only in prior history, SSA emits identical decisions while a cumulative-trained causal baseline does not. Our contribution is a diagnostic of transformer behavior on serialized trajectory data, paired with a structural fix. Pretrained language models that search over their own reasoning steps may face the same failure. Our analysis opens up inference-time context clearing as a candidate way to apply the same isolation without retraining.

翻译：回溯搜索是经典约束求解器、规划器和定理证明器的基础。近年来基于Transformer的推理系统通过探索自身中间步骤的搜索树来解决问题。常见的训练方法是在离线求解器轨迹上拟合自回归下一个token的损失函数。模型每一步的输入是此前所有决策的累积轨迹。由于到达相同状态的两条轨迹可产生相同的可行后续步骤，最优的继续/回溯预测器应仅取决于当前搜索状态。我们发现，基于累计轨迹训练的仅解码器Transformer在两方面未能满足这一要求：轨迹可能将状态特征分散到多个位置（分散检索），且预测器可能根据轨迹而非状态进行条件判断（历史纠缠）。针对分散检索问题，我们提出局部化方法——通过重写每个决策块以在局部暴露状态特征的轨迹级修正；针对历史纠缠问题，我们提出选择性状态注意机制（SSA）——一种固定的注意力掩码，在不修改训练数据、目标函数或参数的前提下从结构上强制基于状态的决策。我们聚焦传播暴露矛盾后的反应式验证环节，在3-SAT、图着色、积木世界和回溯解析等任务上测试SSA。对于仅历史轨迹不同但状态相同的配对样本，SSA能输出相同的决策，而基于累计轨迹的因果基线模型则不能。本文的贡献在于：对序列化轨迹数据上的Transformer行为进行诊断，并配以结构性修正方案。在自身推理步骤上进行搜索的预训练语言模型可能面临相同的缺陷。我们的分析表明，推理阶段的上下文清除技术可作为无需重新训练的隔离方案候选手段。