Multi-robot navigation is the task of finding trajectories for a team of robotic agents to reach their destinations as quickly as possible without collisions. In this work, we introduce a new problem: fair-delay multi-robot navigation, which aims not only to enable such efficient, safe travels but also to equalize the travel delays among agents in terms of actual trajectories as compared to the best possible trajectories. The learning of a navigation policy to achieve this objective requires resolving a nontrivial credit assignment problem with robotic agents having continuous action spaces. Hence, we developed a new algorithm called Navigation with Counterfactual Fairness Filter (NCF2). With NCF2, each agent performs counterfactual inference on whether it can advance toward its goal or should stay still to let other agents go. Doing so allows us to effectively address the aforementioned credit assignment problem and improve fairness regarding travel delays while maintaining high efficiency and safety. Our extensive experimental results in several challenging multi-robot navigation environments demonstrate the greater effectiveness of NCF2 as compared to state-of-the-art fairness-aware multi-agent reinforcement learning methods. Our demo videos and code are available on the project webpage: https://omron-sinicx.github.io/ncf2/
翻译:多机器人导航任务旨在为机器人团队规划轨迹,使其既能在无碰撞条件下尽快到达目的地,又能实现安全高效的运动。本文提出一个新问题:公平延迟多机器人导航,其目标不仅是实现高效安全的运动,还要均衡各智能体在真实轨迹与最优可能轨迹之间的旅行延迟。为实现这一目标而学习导航策略,需要解决智能体连续动作空间中棘手的信用分配问题。为此,我们开发了名为"反事实公平滤波器导航"(NCF2)的新算法。在NCF2中,每个智能体通过反事实推理判断自己应朝目标前进还是驻留让行。该方法有效解决了上述信用分配问题,在保持高安全性与高效率的同时改善了旅行延迟的公平性。在多个具有挑战性的多机器人导航环境中的大量实验结果表明,与最先进的公平感知多智能体强化学习方法相比,NCF2具有更强的有效性。演示视频与代码已发布于项目页面:https://omron-sinicx.github.io/ncf2/