Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance, aiming to minimize the total distance traveled. The problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation. We address this problem in a decentralized setting where each robot knows only the positions of its $k$-nearest robots and $k$-nearest targets. This scenario combines elements of combinatorial assignment and continuous-space motion planning, posing significant scalability challenges for traditional centralized approaches. To overcome these challenges, we propose a decentralized policy learned via a Graph Neural Network (GNN). The GNN enables robots to determine (1) what information to communicate to neighbors and (2) how to integrate received information with local observations for decision-making. We train the GNN using imitation learning with the centralized Hungarian algorithm as the expert policy, and further fine-tune it using reinforcement learning to avoid collisions and enhance performance. Extensive empirical evaluations demonstrate the scalability and effectiveness of our approach. The GNN policy trained on 100 robots generalizes to scenarios with up to 500 robots, outperforming state-of-the-art solutions by 8.6\% on average and significantly surpassing greedy decentralized methods. This work lays the foundation for solving multi-robot coordination problems in settings where scalability is important.
翻译:无标记运动规划涉及将一组机器人分配至目标位置,同时确保避碰,旨在最小化总行进距离。该问题构成了多机器人系统在探索、监视和运输等应用中的核心基础模块。我们在去中心化场景下解决此问题,其中每个机器人仅知晓其$k$个最近邻机器人与$k$个最近邻目标的位置。该场景融合了组合分配与连续空间运动规划的元素,对传统集中式方法提出了显著的可扩展性挑战。为克服这些挑战,我们提出一种通过图神经网络学习的去中心化策略。该图神经网络使机器人能够决定(1)向邻居传递何种信息,以及(2)如何将接收的信息与局部观测相结合以进行决策。我们使用以集中式匈牙利算法作为专家策略的模仿学习来训练图神经网络,并进一步通过强化学习进行微调以实现避碰并提升性能。大量实证评估证明了我们方法的可扩展性与有效性。在100个机器人场景下训练的图神经网络策略可泛化至最多500个机器人的场景,平均性能优于现有最优解决方案8.6%,并显著超越贪婪型去中心化方法。本研究为解决可扩展性至关重要的多机器人协同问题奠定了基础。