LLM agents mis-call tools, and the natural guess is that the model failed to see the right tool in a crowded harness. We show the opposite through a lens concurrent work sets aside -- the model's attention to labeled tool-definition segments. On real BFCL failures, by per-candidate attention argmax the model attends most to the correct tool 80% of the time (vs. 21% chance), and the gold is the under-attended segment on only 10%: it looks at the right tool and still picks wrong. This directly refutes the intuitive "crowded-harness / lost-in-the-middle" explanation: the failure is at the decision readout, not the harness, and we pin it there three ways. (1) Input vs. readout: repairing the prompt (reordering or duplicating the gold tool) recovers <=23% of failures, while readout-side interventions recover 59-91%. (2) Representation-invariance: two gold-pointed interventions in different representations -- an additive attention-logit bias and a residual-stream steering vector -- recover largely the same failures (per-task Jaccard 0.865 pooled, 0.79-0.91 per model), so the bottleneck is localized to the readout independent of which representation is poked. (3) A training-free, gold-free selector: per-segment attention closes most of the gold-free-vs-oracle gap on BFCL (+11.9 pts pooled function-name selection vs. +17.9-pt oracle headroom) and adds +14.9 pts on Seal-Tools; every model positive (exact McNemar p<=8e-4 each). Scopes differ: the causal attention-bias dose-response is bidirectional and monotonic on 10 mask-honoring models (3-32B), the full 0.5-32B span carrying only the correlational diagnostic; the deployable selector is evaluated on 5 single-turn models and does not yet transfer to a multi-turn loop.
翻译:[译摘] LLM智能体错误调用工具,通常归因于模型在拥挤的工具集合中未能定位正确选项。本研究通过被现有工作忽视的视角——模型对标注工具定义片段的注意力——揭示了相反现象。在真实BFCL失败案例中,基于逐候选项注意力最大值的分析表明:模型在80%情况下(对比21%随机概率)关注了正确工具,而正确工具仅在10%场景下属于低注意力片段——模型注视正确工具却仍选择错误。这直接驳斥了直观的“拥挤工具集/中间迷失”解释:失败源于决策读出阶段而非工具集合,我们通过三重证据锁定该结论。(1) 输入vs读出:修复提示(重排序或复制正确工具)仅恢复≤23%失败,而读出端干预可恢复59-91%。(2) 表征不变性:两种基于正确工具的干预方法——注意力对数偏置(加法)与残差流操控向量(表征空间)——恢复的失败案例高度重合(跨任务Jaccard系数:合并模型0.865,各模型0.79-0.91),表明瓶颈定位于读出环节且与具体表征无关。(3) 无训练无真值选择器:基于片段注意力的方法在BFCL上弥合了无真值选择器与理想选择器间的差距(合并函数名选择提升+11.9pts vs. 理想上限+17.9pts),在Seal-Tools上额外提升+14.9pts;所有模型均通过精确McNemar检验(p≤8e-4)。研究范围区分:因果注意力偏置剂量响应在10个遵守掩码规则的模型(3-32B)中呈现双向单调性,0.5-32B全谱系仅支持相关性诊断;可部署选择器在5个单轮模型上验证,暂未扩展至多轮对话场景。