Large language models (LLMs) have shown significant potential in guiding embodied agents to execute language instructions across a range of tasks, including robotic manipulation and navigation. However, existing methods are primarily designed for static environments and do not leverage the agent's own experiences to refine its initial plans. Given that real-world environments are inherently stochastic, initial plans based solely on LLMs' general knowledge may fail to achieve their objectives, unlike in static scenarios. To address this limitation, this study introduces the Experience-and-Emotion Map (E2Map), which integrates not only LLM knowledge but also the agent's real-world experiences, drawing inspiration from human emotional responses. The proposed methodology enables one-shot behavior adjustments by updating the E2Map based on the agent's experiences. Our evaluation in stochastic navigation environments, including both simulations and real-world scenarios, demonstrates that the proposed method significantly enhances performance in stochastic environments compared to existing LLM-based approaches. Code and supplementary materials are available at https://e2map.github.io/.
翻译:大型语言模型(LLM)在指导具身智能体执行语言指令方面展现出巨大潜力,涵盖机器人操作与导航等多种任务。然而,现有方法主要针对静态环境设计,未能利用智能体自身经验优化其初始规划。鉴于现实环境具有内在随机性,仅基于LLM通用知识制定的初始规划可能无法达成目标,这与静态场景存在本质差异。为突破这一局限,本研究提出经验与情感地图(E2Map),其不仅融合LLM知识,还借鉴人类情感响应机制,整合智能体在真实世界中的经验。所提出的方法通过基于智能体经验更新E2Map,实现单次行为调整。我们在随机导航环境(包括仿真与真实场景)中的评估表明,相较于现有基于LLM的方法,本方法在随机环境中的性能得到显著提升。代码与补充材料详见 https://e2map.github.io/。