Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics have several limitations, particularly in terms of computational efficiency, which is a promise of E-AI simulators. To overcome these shortcomings, we introduce HabiCrowd, the first standard benchmark for crowd-aware visual navigation that integrates a crowd dynamics model with diverse human settings into photorealistic environments. Empirical evaluations demonstrate that our proposed human dynamics model achieves state-of-the-art performance in collision avoidance, while exhibiting superior computational efficiency compared to its counterparts. We leverage HabiCrowd to conduct several comprehensive studies on crowd-aware visual navigation tasks and human-robot interactions. The source code and data can be found at https://habicrowd.github.io/.
翻译:视觉导航作为具身人工智能(E-AI)的一个基础方面,在过去几年中得到了广泛研究。尽管已有许多3D仿真器被提出以支持视觉导航任务,但很少有工作致力于结合人类动态,这造成了仿真与现实应用之间的差距。此外,当前结合人类动态的3D仿真器存在若干局限性,特别是在计算效率方面——而这正是E-AI仿真器所应具备的优势。为克服这些不足,我们提出了HabiCrowd,这是首个面向群体感知视觉导航的标准基准,它将具有多样化人类设置的群体动态模型集成到逼真的视觉环境中。实证评估表明,我们提出的人类动态模型在避碰方面达到了最先进的性能,同时相比同类模型展现出更优的计算效率。我们利用HabiCrowd对群体感知视觉导航任务及人机交互进行了多项综合性研究。源代码与数据可在 https://habicrowd.github.io/ 获取。