Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancy from real-world situations. To address these issues, we propose a Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) that comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios. Specifically, DOZE scenes feature multiple moving humanoid obstacles, a wide array of open-vocabulary objects, diverse distinct-attribute objects, and valuable textual hints. Besides, different from existing datasets that only provide collision checking between the agent and static obstacles, we enhance DOZE by integrating capabilities for detecting collisions between the agent and moving obstacles. This novel functionality enables evaluation of the agents' collision avoidance abilities in dynamic environments. We test four representative ZSON methods on DOZE, revealing substantial room for improvement in existing approaches concerning navigation efficiency, safety, and object recognition accuracy. Our dataset could be found at https://DOZE-Dataset.github.io/.
翻译:零样本目标导航(ZSON)要求智能体在陌生环境中自主定位并接近未见过的目标,已成为具身人工智能领域中极具挑战性的任务。现有用于开发ZSON算法的数据集缺乏对动态障碍物、目标属性多样性以及场景文本的考量,因此与现实场景存在显著偏差。为解决这些问题,我们提出了一个面向动态环境中开放词汇零样本目标导航的数据集(DOZE),该数据集包含十个高保真三维场景和超过18000个任务,旨在模拟复杂动态的现实场景。具体而言,DOZE场景包含多个移动的人形障碍物、大量开放词汇目标、多样化的具有显著属性的目标以及有价值的文本提示。此外,与仅提供智能体与静态障碍物碰撞检测的现有数据集不同,我们通过集成检测智能体与移动障碍物碰撞的能力来增强DOZE。这一新功能使得能够评估智能体在动态环境中的避障能力。我们在DOZE上测试了四种代表性ZSON方法,揭示了现有方法在导航效率、安全性和目标识别准确性方面仍有显著改进空间。我们的数据集可在https://DOZE-Dataset.github.io/获取。