Autonomous mobile robots need to perceive the environments with their onboard sensors (e.g., LiDARs and RGB cameras) and then make appropriate navigation decisions. In order to navigate human-inhabited public spaces, such a navigation task becomes more than only obstacle avoidance, but also requires considering surrounding humans and their intentions to somewhat change the navigation behavior in response to the underlying social norms, i.e., being socially compliant. Machine learning methods are shown to be effective in capturing those complex and subtle social interactions in a data-driven manner, without explicitly hand-crafting simplified models or cost functions. Considering multiple available sensor modalities and the efficiency of learning methods, this paper presents a comprehensive study on learning social robot navigation with multimodal perception using a large-scale real-world dataset. The study investigates social robot navigation decision making on both the global and local planning levels and contrasts unimodal and multimodal learning against a set of classical navigation approaches in different social scenarios, while also analyzing the training and generalizability performance from the learning perspective. We also conduct a human study on how learning with multimodal perception affects the perceived social compliance. The results show that multimodal learning has a clear advantage over unimodal learning in both dataset and human studies. We open-source our code for the community's future use to study multimodal perception for learning social robot navigation.
翻译:自主移动机器人需通过其搭载的传感器(如激光雷达和RGB摄像头)感知环境并做出恰当的导航决策。在人类居住的公共空间中导航时,这一任务不仅需要避开障碍物,还需考虑周围人群及其意图,根据潜在的社会规范(即社交合规性)适当调整导航行为。机器学习方法已被证实能够以数据驱动的方式有效捕捉这些复杂而微妙的社会交互,而无需显式手工构建简化模型或代价函数。考虑到多种可用的传感器模态与学习方法的高效性,本文利用大规模真实世界数据集,对基于多模态感知的社交机器人导航学习进行了全面研究。本研究在全局与局部规划两个层面探讨社交机器人导航决策,并在不同社交场景下将单模态与多模态学习方法与一系列经典导航方法进行对比,同时从学习视角分析其训练性能与泛化能力。我们还开展了一项人类研究,探究多模态感知学习如何影响感知到的社交合规性。结果表明,无论数据集评估还是人类研究,多模态学习均较单模态学习具有显著优势。我们开源了相关代码,供社区未来用于研究面向社交机器人导航学习的多模态感知。