This paper introduces CARSS (Cooperative Attention-guided Reinforcement Subpath Synthesis), a novel approach to address the Traveling Salesman Problem (TSP) by leveraging cooperative Multi-Agent Reinforcement Learning (MARL). CARSS decomposes the TSP solving process into two distinct yet synergistic steps: "subpath generation" and "subpath merging." In the former, a cooperative MARL framework is employed to iteratively generate subpaths using multiple agents. In the latter, these subpaths are progressively merged to form a complete cycle. The algorithm's primary objective is to enhance efficiency in terms of training memory consumption, testing time, and scalability, through the adoption of a multi-agent divide and conquer paradigm. Notably, attention mechanisms play a pivotal role in feature embedding and parameterization strategies within CARSS. The training of the model is facilitated by the independent REINFORCE algorithm. Empirical experiments reveal CARSS's superiority compared to single-agent alternatives: it demonstrates reduced GPU memory utilization, accommodates training graphs nearly 2.5 times larger, and exhibits the potential for scaling to even more extensive problem sizes. Furthermore, CARSS substantially reduces testing time and optimization gaps by approximately 50% for TSP instances of up to 1000 vertices, when compared to standard decoding methods.
翻译:本文提出CARSS(协作注意力引导子路径合成强化学习方法),一种通过协作多智能体强化学习(MARL)解决旅行商问题(TSP)的新颖方法。CARSS将TSP求解过程分解为两个协同步骤:“子路径生成”与“子路径合并”。前者采用协作MARL框架,通过多个智能体迭代生成子路径;后者逐步合并这些子路径以形成完整回路。该算法的核心目标是通过多智能体分治策略提升效率,具体体现在训练内存消耗、测试时间和可扩展性方面。值得注意的是,注意力机制在CARSS的特征嵌入与参数化策略中发挥关键作用。模型训练由独立REINFORCE算法支撑。实验结果表明,与单智能体替代方案相比,CARSS具有显著优势:降低GPU内存利用率,支持训练图规模扩大近2.5倍,并展现出向更大规模问题扩展的潜力。此外,对于顶点数达1000的TSP实例,相较于标准解码方法,CARSS将测试时间和优化差距减少约50%。