Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics have several limitations, particularly in terms of computational efficiency, which is a promise of E-AI simulators. To overcome these shortcomings, we introduce HabiCrowd, the first standard benchmark for crowd-aware visual navigation that integrates a crowd dynamics model with diverse human settings into photorealistic environments. Empirical evaluations demonstrate that our proposed human dynamics model achieves state-of-the-art performance in collision avoidance, while exhibiting superior computational efficiency compared to its counterparts. We leverage HabiCrowd to conduct several comprehensive studies on crowd-aware visual navigation tasks and human-robot interactions. The source code and data can be found at https://habicrowd.github.io/.
翻译:视觉导航作为具身智能(E-AI)的基础方向,在过去数年间得到了广泛研究。尽管众多三维仿真器已被引入以支持视觉导航任务,但鲜有研究将人类动力学纳入其中,导致仿真环境与现实应用之间存在鸿沟。此外,当前融入人类动力学的三维仿真器存在若干局限,尤其在计算效率方面——这恰恰是E-AI仿真器的重要前提。为突破这些瓶颈,我们提出HabiCrowd——首个融合人群动力学模型与多样化人类场景的逼真环境标准基准,专用于人群感知视觉导航。实验评估表明,我们提出的人类动力学模型在碰撞规避性能上达到当前最优,同时展现出显著优于同类方法的计算效率。依托HabiCrowd,我们开展了多项针对人群感知视觉导航任务与人机交互的综合性研究。源代码与数据可通过https://habicrowd.github.io/获取。