CANVAS：基于常识感知的导航系统，实现直观人机交互 (CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction)

Real-life robot navigation involves more than just reaching a destination; it requires optimizing movements while addressing scenario-specific goals. An intuitive way for humans to express these goals is through abstract cues like verbal commands or rough sketches. Such human guidance may lack details or be noisy. Nonetheless, we expect robots to navigate as intended. For robots to interpret and execute these abstract instructions in line with human expectations, they must share a common understanding of basic navigation concepts with humans. To this end, we introduce CANVAS, a novel framework that combines visual and linguistic instructions for commonsense-aware navigation. Its success is driven by imitation learning, enabling the robot to learn from human navigation behavior. We present COMMAND, a comprehensive dataset with human-annotated navigation results, spanning over 48 hours and 219 km, designed to train commonsense-aware navigation systems in simulated environments. Our experiments show that CANVAS outperforms the strong rule-based system ROS NavStack across all environments, demonstrating superior performance with noisy instructions. Notably, in the orchard environment, where ROS NavStack records a 0% total success rate, CANVAS achieves a total success rate of 67%. CANVAS also closely aligns with human demonstrations and commonsense constraints, even in unseen environments. Furthermore, real-world deployment of CANVAS showcases impressive Sim2Real transfer with a total success rate of 69%, highlighting the potential of learning from human demonstrations in simulated environments for real-world applications.

翻译：现实中的机器人导航不仅涉及到达目的地，还需要在优化运动的同时处理特定场景的目标。人类表达这些目标的一种直观方式是通过口头指令或粗略草图等抽象线索。此类人类引导可能缺乏细节或存在噪声。尽管如此，我们期望机器人能按预期进行导航。为使机器人能根据人类期望解释并执行这些抽象指令，其必须与人类共享对基本导航概念的共同理解。为此，我们提出CANVAS——一种融合视觉与语言指令的常识感知导航新框架。其成功得益于模仿学习，使机器人能够从人类导航行为中学习。我们构建了COMMAND数据集，该数据集包含人工标注的导航结果，总时长超过48小时、距离达219公里，专为在仿真环境中训练常识感知导航系统而设计。实验表明，CANVAS在所有环境中均优于基于规则的强基准系统ROS NavStack，并在噪声指令下表现出更优性能。值得注意的是，在果园环境中，ROS NavStack的总成功率记录为0%，而CANVAS达到了67%的总成功率。即使在未见过的环境中，CANVAS也能紧密贴合人类示范与常识约束。此外，CANVAS在真实世界的部署展示了69%总成功率的卓越仿真到现实迁移能力，凸显了通过仿真环境中的人类示范学习实现现实应用的潜力。