User instructions are often underspecified because humans rely on implicit assumptions about the surrounding environment. For large language model (LLM) agents operating in information-rich digital and physical environments, these assumptions cannot be inferred from the instruction alone; they must be recovered from the current state of tools, data, interfaces, and observations. Effective execution therefore requires agents to identify missing context, ground it in observed evidence, and carry it forward into subsequent actions. We show that current agents often fail to do so. They act from assumed rather than observed specifics, overlook information they could have gathered, and fail to incorporate evidence that has already been returned. Building on this insight, we propose ACCORD (Action-Conditioned Contextual Grounding), a simple and effective agent framework for adaptive grounding. Before each action, ACCORD actively probes the environment for missing information and integrates relevant context from the agent's trajectory that would otherwise be overlooked. Requiring no additional training or task-success signals, ACCORD improves task-goal completion on AppWorld by up to +20.6 points with GPT-5-mini, from 42.0% to 62.6%, compared to strong baselines. These gains persist with a substantially stronger base model (+10.8 with Claude-4.5-sonnet), an open-weight model (+10.1 with Qwen3.5-27B-FP8), and on the embodied AlfWorld benchmark (+7.4 success rate with GPT-5-mini).
翻译:用户指令往往因隐含对周遭环境的假设而定义不充分。对于在信息丰富的数字与物理环境中运行的大语言模型智能体而言,这些假设无法仅从指令中推断;它们必须从工具、数据、界面和观察的当前状态中恢复。因此,有效执行要求智能体识别缺失的上下文,将其基于观察到的证据进行接地,并将其延续至后续动作中。我们表明,当前智能体往往未能做到这一点——它们基于假设而非观察到的细节行动,忽略本可收集的信息,且未能整合已经返回的证据。基于这一发现,我们提出ACCORD(动作条件上下文接地),一种简单且有效的自适应接地智能体框架。在每个动作之前,ACCORD主动探测环境中缺失的信息,并整合来自智能体轨迹中本会被忽略的相关上下文。无需额外训练或任务成功信号,ACCORD在AppWorld上将任务目标完成率相较于强基线提升了最多+20.6个百分点(GPT-5-mini:从42.0%提升至62.6%)。这些提升在显著更强的基模型(Claude-4.5-sonnet:+10.8)、开放权重模型(Qwen3.5-27B-FP8:+10.1)以及具体化的AlfWorld基准(GPT-5-mini:成功率+7.4)中均得以保持。