The Traveling Salesman Problem (TSP) is a well-known combinatorial optimization problem with broad real-world applications. Recently, neural networks have gained popularity in this research area because they provide strong heuristic solutions to TSPs. Compared to autoregressive neural approaches, non-autoregressive (NAR) networks exploit the inference parallelism to elevate inference speed but suffer from comparatively low solution quality. In this paper, we propose a novel NAR model named NAR4TSP, which incorporates a specially designed architecture and an enhanced reinforcement learning strategy. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR networks. The key lies in the incorporation of NAR network output decoding into the training process. NAR4TSP efficiently represents TSP encoded information as rewards and seamlessly integrates it into reinforcement learning strategies, while maintaining consistent TSP sequence constraints during both training and testing phases. Experimental results on both synthetic and real-world TSP instances demonstrate that NAR4TSP outperforms four state-of-the-art models in terms of solution quality, inference speed, and generalization to unseen scenarios.
翻译:旅行商问题(TSP)是一个著名的组合优化问题,具有广泛的实际应用场景。近年来,神经网络在该研究领域广受欢迎,因其能为TSP提供强大的启发式解决方案。与非自回归方法相比,自回归神经网络虽能利用推理并行性提升推理速度,但求解质量相对较低。本文提出一种名为NAR4TSP的新型非自回归(NAR)模型,该模型融合了专门设计的架构和增强型强化学习策略。据我们所知,NAR4TSP是首个成功将强化学习与NAR网络相结合的TSP求解器。其关键创新在于将NAR网络输出解码过程纳入训练流程。NAR4TSP能够高效地将TSP编码信息转化为奖励信号,并无缝整合到强化学习策略中,同时在训练和测试阶段保持一致的TSP序列约束。在合成数据与真实世界TSP实例上的实验结果表明,NAR4TSP在求解质量、推理速度以及对未见场景的泛化能力方面,均超越了四种当前最优模型。