Understanding, navigating, and exploring the 3D physical real world has long been a central challenge in the development of artificial intelligence. In this work, we take a step toward this goal by introducing GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination that forms priors (expectations) about the surrounding environments. GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image, bringing it to life through panoramic video streams. Leveraging scalable 3D world data curated from Unreal Engine, our generative model is rounded in the physical world. It captures a continuous 360-degree environment with little effort, offering a boundless landscape for AI agents to explore and interact with. GenEx achieves high-quality world generation, robust loop consistency over long trajectories, and demonstrates strong 3D capabilities such as consistency and active 3D mapping. Powered by generative imagination of the world, GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation. These agents utilize predictive expectation regarding unseen parts of the physical world to refine their beliefs, simulate different outcomes based on potential decisions, and make more informed choices. In summary, we demonstrate that GenEx provides a transformative platform for advancing embodied AI in imaginative spaces and brings potential for extending these capabilities to real-world exploration.
翻译:理解、导航和探索三维物理现实世界长期以来一直是人工智能发展的核心挑战。在本工作中,我们通过引入GenEx向这一目标迈进了一步。GenEx是一个能够规划复杂具身世界探索的系统,其引导来源于生成式想象力所形成的关于周围环境的先验(预期)。GenEx仅需单张RGB图像即可生成完整的三维一致想象环境,并通过全景视频流使其生动呈现。利用从Unreal Engine中整理的可扩展三维世界数据,我们的生成模型根植于物理世界。它能够轻松捕捉连续的360度环境,为AI智能体提供了一个无边界的探索与交互景观。GenEx实现了高质量的世界生成、长轨迹上稳健的环路一致性,并展示了强大的三维能力,如一致性和主动三维建图。借助对世界的生成式想象力,GPT辅助的智能体得以执行复杂的具身任务,包括无目标探索和目标驱动导航。这些智能体利用对物理世界未观测部分的预测性预期来优化其信念,基于潜在决策模拟不同结果,并做出更明智的选择。总之,我们证明了GenEx为在想象空间中推进具身AI提供了一个变革性平台,并展现了将这些能力扩展到现实世界探索的潜力。