Combinatorial optimization problems involving multiple agents are notoriously challenging due to their NP-hard nature and the necessity for effective agent coordination. Despite advancements in learning-based methods, existing approaches often face critical limitations, including suboptimal agent coordination, poor generalizability, and high computational latency. To address these issues, we propose Parallel AutoRegressive Combinatorial Optimization (PARCO), a reinforcement learning framework designed to construct high-quality solutions for multi-agent combinatorial tasks efficiently. To this end, PARCO integrates three key components: (1) transformer-based communication layers to enable effective agent collaboration during parallel solution construction, (2) a multiple pointer mechanism for low-latency, parallel agent decision-making, and (3) priority-based conflict handlers to resolve decision conflicts via learned priorities. We evaluate PARCO in multi-agent vehicle routing and scheduling problems where our approach outperforms state-of-the-art learning methods and demonstrates strong generalization ability and remarkable computational efficiency. Code available at: https://github.com/ai4co/parco.
翻译:多智能体组合优化问题因其NP难特性及智能体间有效协调的必要性而极具挑战性。尽管基于学习的方法已取得进展,现有方法仍面临关键局限,包括智能体协调欠佳、泛化能力不足以及计算延迟过高。为解决这些问题,我们提出并行自回归组合优化(PARCO)——一个专为高效构建多智能体组合任务高质量解而设计的强化学习框架。为此,PARCO整合了三个核心组件:(1)基于Transformer的通信层,以支持并行解构建过程中的有效智能体协作;(2)多指针机制,实现低延迟的并行智能体决策;(3)基于优先级的冲突处理器,通过习得的优先级解决决策冲突。我们在多智能体车辆路径规划与调度问题中评估PARCO,结果表明该方法优于现有最先进的学习方法,并展现出强大的泛化能力与显著的计算效率。代码发布于:https://github.com/ai4co/parco。