Planning with partial observation is a central challenge in embodied AI. A majority of prior works have tackled this challenge by developing agents that physically explore their environment to update their beliefs about the world state.In contrast, humans can $\textit{imagine}$ unseen parts of the world through a mental exploration and $\textit{revise}$ their beliefs with imagined observations. Such updated beliefs can allow them to make more informed decisions, without necessitating the physical exploration of the world at all times. To achieve this human-like ability, we introduce the $\textit{Generative World Explorer (Genex)}$, an egocentric world exploration framework that allows an agent to mentally explore a large-scale 3D world (e.g., urban scenes) and acquire imagined observations to update its belief. This updated belief will then help the agent to make a more informed decision at the current step. To train $\textit{Genex}$, we create a synthetic urban scene dataset, Genex-DB. Our experimental results demonstrate that (1) $\textit{Genex}$ can generate high-quality and consistent observations during long-horizon exploration of a large virtual physical world and (2) the beliefs updated with the generated observations can inform an existing decision-making model (e.g., an LLM agent) to make better plans.
翻译:部分可观测环境下的规划是具身人工智能的核心挑战。先前大多数研究通过开发能够物理探索环境以更新其对世界状态信念的智能体来应对这一挑战。与之相反,人类能够通过心理探索来$\textit{想象}$世界中未被观察到的部分,并利用想象出的观测结果$\textit{修正}$其信念。这种更新后的信念使得他们能够在并非总是需要对世界进行物理探索的情况下做出更明智的决策。为实现这种类人能力,我们提出了$\textit{生成式世界探索器 (Genex)}$,这是一个以自我为中心的世界探索框架,允许智能体在心理上探索大规模3D世界(例如城市场景),并获取想象出的观测结果以更新其信念。更新后的信念随后将帮助智能体在当前步骤做出更明智的决策。为训练$\textit{Genex}$,我们创建了一个合成城市场景数据集Genex-DB。我们的实验结果表明:(1) $\textit{Genex}$能够在长期探索大型虚拟物理世界的过程中生成高质量且一致的观测结果;(2) 利用生成观测结果更新的信念能够指导现有决策模型(例如LLM智能体)制定更优的计划。