Large robot fleets are now common in warehouses and other logistics settings, where small control gains translate into large operational impacts. In this article, we address task scheduling for lifelong Multi-Agent Pickup-and-Delivery (MAPD) and propose a hybrid method that couples learning-based global guidance with lightweight optimization. A graph neural network policy trained via reinforcement learning outputs a desired distribution of free agents over an aggregated warehouse graph. This signal is converted into region-to-region rebalancing through a minimum-cost flow, and finalized by small, local assignment problems, preserving accuracy while keeping per-step latency within a 1 s compute budget. On congested warehouse benchmarks from the League of Robot Runners (LoRR) with up to 500 agents, our approach improves throughput by up to 10% over the 2024 winning scheduler while maintaining real-time execution. The results indicate that coupling graph-structured learned guidance with tractable solvers reduces congestion and yields a practical, scalable blueprint for high-throughput scheduling in large fleets.
翻译:大型机器人车队如今在仓库及其他物流场景中已十分普遍,其中微小的控制增益即可转化为巨大的运营效益。本文针对终身多智能体取送(MAPD)任务调度问题,提出一种将基于学习的全局引导与轻量级优化相结合的混合方法。通过强化学习训练的图神经网络策略在聚合仓库图上输出空闲智能体的期望分布。该信号通过最小费用流转换为区域间再平衡,最终由小规模局部分配问题完成调度,在保持精度的同时将单步计算延迟控制在1秒预算内。在包含多达500个智能体的“机器人跑者联盟”(LoRR)拥堵仓库基准测试中,我们的方法相比2024年获胜调度器将吞吐量提升高达10%,同时保持实时执行性能。结果表明,将图结构学习引导与可求解优化器耦合能够有效缓解拥堵,为大规模车队的高吞吐量调度提供了实用且可扩展的技术蓝图。