3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs for styles, and can capture the semantics of complex queries such as "apartment for a researcher with a cat" and "office of a professor who is a fan of Star Wars". Holodeck leverages a large language model (i.e., GPT-4) for common sense knowledge about what the scene might look like and uses a large collection of 3D assets from Objaverse to populate the scene with diverse objects. To address the challenge of positioning objects correctly, we prompt GPT-4 to generate spatial relational constraints between objects and then optimize the layout to satisfy those constraints. Our large-scale human evaluation shows that annotators prefer Holodeck over manually designed procedural baselines in residential scenes and that Holodeck can produce high-quality outputs for diverse scene types. We also demonstrate an exciting application of Holodeck in Embodied AI, training agents to navigate in novel scenes like music rooms and daycares without human-constructed data, which is a significant step forward in developing general-purpose embodied agents.
翻译:3D仿真环境在具身AI中扮演着关键角色,但其创建依赖专业知识与大量人工操作,限制了场景的多样性与覆盖范围。为此,我们提出Holodeck系统,该系统可完全自动化地生成与用户提示匹配的3D环境。Holodeck能生成包含游戏厅、水疗馆、博物馆等多样化场景,根据风格调整设计,并理解复杂查询语义,例如"带猫的研究员公寓"和"星战迷教授的办公室"。Holodeck利用大型语言模型(即GPT-4)获取场景常识,并通过Objaverse库中的海量3D资产丰富场景物体。为应对物体正确定位的挑战,我们引导GPT-4生成物体间的空间关系约束,进而优化布局以满足约束。大规模人工评估表明,在住宅场景中,标注员更青睐Holodeck而非人工设计的程序化基线方案,且Holodeck能为各类场景生成高质量输出。我们进一步展示了Holodeck在具身AI中的激动人心的应用——无需人工构建数据即可训练智能体在音乐室、日托班等新型场景中导航,这为开发通用具身智能体迈出了重要一步。