Looking Is Not Picking: An Attention-Segment Account of Tool-Selection Failures in LLM Agents

LLM agents mis-call tools, and the natural guess is that the model failed to see the right tool in a crowded harness. We show the opposite through a lens concurrent work sets aside -- the model's attention to labeled tool-definition segments. On real BFCL failures, by per-candidate attention argmax the model attends most to the correct tool 80% of the time (vs. 21% chance), and the gold is the under-attended segment on only 10%: it looks at the right tool and still picks wrong. This directly refutes the intuitive "crowded-harness / lost-in-the-middle" explanation: the failure is at the decision readout, not the harness, and we pin it there three ways. (1) Input vs. readout: repairing the prompt (reordering or duplicating the gold tool) recovers <=23% of failures, while readout-side interventions recover 59-91%. (2) Representation-invariance: two gold-pointed interventions in different representations -- an additive attention-logit bias and a residual-stream steering vector -- recover largely the same failures (per-task Jaccard 0.865 pooled, 0.79-0.91 per model), so the bottleneck is localized to the readout independent of which representation is poked. (3) A training-free, gold-free selector: per-segment attention closes most of the gold-free-vs-oracle gap on BFCL (+11.9 pts pooled function-name selection vs. +17.9-pt oracle headroom) and adds +14.9 pts on Seal-Tools; every model positive (exact McNemar p<=8e-4 each). Scopes differ: the causal attention-bias dose-response is bidirectional and monotonic on 10 mask-honoring models (3-32B), the full 0.5-32B span carrying only the correlational diagnostic; the deployable selector is evaluated on 5 single-turn models and does not yet transfer to a multi-turn loop.

翻译：[译摘] LLM智能体错误调用工具，通常归因于模型在拥挤的工具集合中未能定位正确选项。本研究通过被现有工作忽视的视角——模型对标注工具定义片段的注意力——揭示了相反现象。在真实BFCL失败案例中，基于逐候选项注意力最大值的分析表明：模型在80%情况下（对比21%随机概率）关注了正确工具，而正确工具仅在10%场景下属于低注意力片段——模型注视正确工具却仍选择错误。这直接驳斥了直观的“拥挤工具集/中间迷失”解释：失败源于决策读出阶段而非工具集合，我们通过三重证据锁定该结论。(1) 输入vs读出：修复提示（重排序或复制正确工具）仅恢复≤23%失败，而读出端干预可恢复59-91%。(2) 表征不变性：两种基于正确工具的干预方法——注意力对数偏置（加法）与残差流操控向量（表征空间）——恢复的失败案例高度重合（跨任务Jaccard系数：合并模型0.865，各模型0.79-0.91），表明瓶颈定位于读出环节且与具体表征无关。(3) 无训练无真值选择器：基于片段注意力的方法在BFCL上弥合了无真值选择器与理想选择器间的差距（合并函数名选择提升+11.9pts vs. 理想上限+17.9pts），在Seal-Tools上额外提升+14.9pts；所有模型均通过精确McNemar检验（p≤8e-4）。研究范围区分：因果注意力偏置剂量响应在10个遵守掩码规则的模型（3-32B）中呈现双向单调性，0.5-32B全谱系仅支持相关性诊断；可部署选择器在5个单轮模型上验证，暂未扩展至多轮对话场景。