As the number of service robots and autonomous vehicles in human-centered environments grows, their requirements go beyond simply navigating to a destination. They must also take into account dynamic social contexts and ensure respect and comfort for others in shared spaces, which poses significant challenges for perception and planning. In this paper, we present a group-based social navigation framework GSON to enable mobile robots to perceive and exploit the social group of their surroundings by leveling the visual reasoning capability of the Large Multimodal Model (LMM). For perception, we apply visual prompting techniques to zero-shot extract the social relationship among pedestrians and combine the result with a robust pedestrian detection and tracking pipeline to alleviate the problem of low inference speed of the LMM. Given the perception result, the planning system is designed to avoid disrupting the current social structure. We adopt a social structure-based mid-level planner as a bridge between global path planning and local motion planning to preserve the global context and reactive response. The proposed method is validated on real-world mobile robot navigation tasks involving complex social structure understanding and reasoning. Experimental results demonstrate the effectiveness of the system in these scenarios compared with several baselines.
翻译:随着服务机器人和自动驾驶车辆在以人为本的环境中的数量不断增加,其需求已超越了简单地导航至目的地。它们还必须考虑动态的社交情境,并确保在共享空间中尊重他人并让他人感到舒适,这对感知与规划提出了重大挑战。本文提出了一种基于群体的社交导航框架GSON,旨在通过利用大型多模态模型(LMM)的视觉推理能力,使移动机器人能够感知并利用其周围环境的社交群体。在感知方面,我们应用视觉提示技术以零样本方式提取行人间的社交关系,并将结果与鲁棒的行人检测与跟踪流程相结合,以缓解LMM推理速度较慢的问题。基于感知结果,规划系统被设计为避免干扰当前的社交结构。我们采用了一种基于社交结构的中层规划器,作为全局路径规划与局部运动规划之间的桥梁,以保持全局上下文感知和反应式响应能力。所提出的方法在涉及复杂社交结构理解与推理的真实世界移动机器人导航任务中得到了验证。实验结果表明,与多个基线方法相比,该系统在这些场景中具有显著的有效性。