3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs for styles, and can capture the semantics of complex queries such as "apartment for a researcher with a cat" and "office of a professor who is a fan of Star Wars". Holodeck leverages a large language model (GPT-4) for common sense knowledge about what the scene might look like and uses a large collection of 3D assets from Objaverse to populate the scene with diverse objects. To address the challenge of positioning objects correctly, we prompt GPT-4 to generate spatial relational constraints between objects and then optimize the layout to satisfy those constraints. Our large-scale human evaluation shows that annotators prefer Holodeck over manually designed procedural baselines in residential scenes and that Holodeck can produce high-quality outputs for diverse scene types. We also demonstrate an exciting application of Holodeck in Embodied AI, training agents to navigate in novel scenes like music rooms and daycares without human-constructed data, which is a significant step forward in developing general-purpose embodied agents.
翻译:三维模拟环境在具身AI中发挥着关键作用,但其创建需要专业知识和大量人工劳动,限制了环境的多样性和覆盖范围。为缓解这一限制,我们提出Holodeck系统,该系统能够根据用户提供的提示全自动生成三维环境。Holodeck可生成多样化场景(如游戏厅、水疗馆、博物馆),并根据风格调整设计,同时能捕捉复杂查询的语义,例如"有猫的研究员公寓"和"热爱《星球大战》的教授办公室"。Holodeck利用大型语言模型(GPT-4)获取场景外观的常识性知识,并通过Objaverse的大量三维资产库用多样化物体填充场景。为解决物体正确定位的挑战,我们引导GPT-4生成物体间的空间关系约束,随后通过优化布局满足这些约束。大规模人类评估表明,在住宅场景中,标注者更偏好Holodeck而非人工设计的程序化基线方法,且Holodeck能为多样化场景类型生成高质量输出。我们还展示了Holodeck在具身AI中的激动人心应用——无需人工构建数据即可训练智能体在音乐室、托儿所等新场景中导航,这为开发通用具身智能体迈出了重要一步。