Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancies from real-world situations. To address these issues, we propose a Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) that comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios. Specifically, DOZE scenes feature multiple moving humanoid obstacles, a wide array of open-vocabulary objects, diverse distinct-attribute objects, and valuable textual hints. Besides, different from existing datasets that only provide collision checking between the agent and static obstacles, we enhance DOZE by integrating capabilities for detecting collisions between the agent and moving obstacles. This novel functionality enables the evaluation of the agents' collision avoidance abilities in dynamic environments. We test four representative ZSON methods on DOZE, revealing substantial room for improvement in existing approaches concerning navigation efficiency, safety, and object recognition accuracy. Our dataset can be found at https://DOZE-Dataset.github.io/.
翻译:零样本目标导航要求智能体在陌生环境中自主定位并接近未见过的目标,已成为具身人工智能领域中一项极具挑战性的任务。现有用于开发零样本目标导航算法的数据集普遍缺乏对动态障碍物、目标属性多样性及场景文本的考量,与真实世界场景存在显著差异。为应对这些问题,我们提出了动态环境中开放词汇零样本目标导航数据集,该数据集包含十个高保真三维场景及超过1.8万个任务,旨在模拟复杂动态的真实世界场景。具体而言,DOZE场景具有以下特征:多个人形移动障碍物、大量开放词汇目标、多样化的差异化属性目标以及具有提示价值的文本信息。此外,不同于现有数据集仅提供智能体与静态障碍物的碰撞检测,我们通过集成智能体与移动障碍物间的碰撞检测功能增强了DOZE数据集。这一创新功能使得评估智能体在动态环境中的避障能力成为可能。我们在DOZE上测试了四种具有代表性的零样本目标导航方法,结果表明现有方法在导航效率、安全性和目标识别精度方面仍有巨大提升空间。本数据集可通过https://DOZE-Dataset.github.io/获取。