From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

Embodied agents operating in multi-agent, partially observable, and decentralized environments must plan and act despite pervasive uncertainty about hidden objects and collaborators' intentions. Recent advances in applying Large Language Models (LLMs) to embodied agents have addressed many long-standing challenges, such as high-level goal decomposition and online adaptation. Yet, uncertainty is still primarily mitigated through frequent inter-agent communication. This incurs substantial token and time costs, and can disrupt established workflows, when human partners are involved. We introduce PCE, a Planner-Composer-Evaluator framework that converts the fragmented assumptions latent in LLM reasoning traces into a structured decision tree. Internal nodes encode environment assumptions and leaves map to actions; each path is then scored by scenario likelihood, goal-directed gain, and execution cost to guide rational action selection without heavy communication. Across two challenging multi-agent benchmarks (C-WAH and TDW-MAT) and three diverse LLM backbones, PCE consistently outperforms communication-centric baselines in success rate and task efficiency while showing comparable token usage. Ablation results indicate that the performance gains obtained by scaling model capacity or reasoning depth persist even when PCE is applied, while PCE consistently raises the baseline across both capacity and reasoning-depth scales, confirming that structured uncertainty handling complements both forms of scaling. A user study further demonstrates that PCE produces communication patterns that human partners perceive as more efficient and trustworthy. Together, these results establish a principled route for turning latent LLM assumptions into reliable strategies for uncertainty-aware planning.

翻译：在多智能体、部分可观测且去中心化的环境中运行的具身智能体，必须在面对隐藏物体和合作者意图的普遍不确定性时进行规划和行动。将大型语言模型应用于具身智能体的最新进展已解决了许多长期存在的挑战，例如高层目标分解和在线适应。然而，不确定性目前仍主要通过频繁的智能体间通信来缓解。这会产生大量的令牌和时间成本，并且在涉及人类合作伙伴时可能扰乱既定的工作流程。我们提出了PCE（Planner-Composer-Evaluator）框架，该框架将LLM推理轨迹中潜在的碎片化假设转化为结构化的决策树。内部节点编码环境假设，叶节点映射到具体行动；随后通过场景可能性、目标导向收益和执行成本对每条路径进行评分，从而在不依赖大量通信的情况下指导理性的行动选择。在两个具有挑战性的多智能体基准测试（C-WAH和TDW-MAT）和三种不同的LLM骨干网络上，PCE在成功率和任务效率方面持续优于以通信为中心的基线方法，同时显示出相当的令牌使用量。消融实验结果表明，即使应用PCE，通过扩展模型容量或推理深度获得的性能提升依然存在，而PCE在容量和推理深度两个维度上均能持续提升基线性能，这证实了结构化的不确定性处理与两种扩展形式均能形成互补。一项用户研究进一步表明，PCE产生的通信模式被人类合作伙伴认为更高效、更值得信赖。总之，这些结果为将潜在的LLM假设转化为可靠的不确定性感知规划策略确立了一条原则性路径。