Navigating socially in human environments requires more than satisfying geometric constraints, as collision-free paths may still interfere with ongoing activities or conflict with social norms. Addressing this challenge calls for analyzing interactions between agents and incorporating common-sense reasoning into planning. This paper presents a social robot navigation framework that integrates geometric planning with contextual social reasoning. The system first extracts obstacles and human dynamics to generate geometrically feasible candidate paths, then leverages a fine-tuned vision-language model (VLM) to evaluate these paths, informed by contextually grounded social expectations, selecting a socially optimized path for the controller. This task-specific VLM distills social reasoning from large foundation models into a smaller and efficient model, allowing the framework to perform real-time adaptation in diverse human-robot interaction contexts. Experiments in four social navigation contexts demonstrate that our method achieves the best overall performance with the lowest personal space violation duration, the minimal pedestrian-facing time, and no social zone intrusions. Project page: https://path-etiquette.github.io
翻译:在人类环境中进行社会导航不仅需要满足几何约束,因为无碰撞路径仍可能干扰正在进行的活动或违背社会规范。应对这一挑战需要分析智能体间的交互,并将常识推理融入规划过程。本文提出一种社会机器人导航框架,将几何规划与情境化社会推理相结合。该系统首先提取障碍物和人类动态信息以生成几何可行的候选路径,随后利用微调后的视觉语言模型(VLM)结合情境化社会期望对这些路径进行评估,为控制器选择社会最优路径。该任务专用VLM将大型基础模型的社会推理能力提炼至更小的高效模型中,使框架能够在多样化人机交互场景中实现实时适应。在四种社会导航场景中的实验表明,本方法实现了最佳综合性能,具有最低的个人空间侵犯时长、最少的面向行人时间,且无社交区域侵入现象。项目页面:https://path-etiquette.github.io