Traveling Salesman Problem (TSP), as a classic routing optimization problem originally arising in the domain of transportation and logistics, has become a critical task in broader domains, such as manufacturing and biology. Recently, Deep Reinforcement Learning (DRL) has been increasingly employed to solve TSP due to its high inference efficiency. Nevertheless, most of existing end-to-end DRL algorithms only perform well on small TSP instances and can hardly generalize to large scale because of the drastically soaring memory consumption and computation time along with the enlarging problem scale. In this paper, we propose a novel end-to-end DRL approach, referred to as Pointerformer, based on multi-pointer Transformer. Particularly, Pointerformer adopts both reversible residual network in the encoder and multi-pointer network in the decoder to effectively contain memory consumption of the encoder-decoder architecture. To further improve the performance of TSP solutions, Pointerformer employs both a feature augmentation method to explore the symmetries of TSP at both training and inference stages as well as an enhanced context embedding approach to include more comprehensive context information in the query. Extensive experiments on a randomly generated benchmark and a public benchmark have shown that, while achieving comparative results on most small-scale TSP instances as SOTA DRL approaches do, Pointerformer can also well generalize to large-scale TSPs.
翻译:旅行商问题(TSP)作为起源于交通运输与物流领域的经典路径优化问题,已成为制造、生物学等更广泛领域中的关键任务。近年来,深度强化学习(DRL)因其高推理效率而被广泛用于求解TSP。然而,现有大多数端到端DRL算法仅在中小规模TSP实例上表现良好,随着问题规模扩大,内存消耗与计算时间急剧增长,难以泛化至大规模场景。本文提出一种基于多指针Transformer的新型端到端DRL方法——Pointerformer。具体而言,Pointerformer在编码器中采用可逆残差网络,在解码器中引入多指针网络,有效控制编码器-解码器架构的内存消耗。为进一步提升TSP求解性能,Pointerformer在训练与推理阶段分别采用特征增强方法探索TSP的对称性,并通过增强上下文嵌入方法在查询中融入更全面的上下文信息。在随机生成基准与公开基准上的大量实验表明:Pointerformer不仅能在小规模TSP实例上取得与最先进DRL方法相媲美的结果,还能良好地泛化至大规模TSP问题。