In this paper, we present Neural k-Opt (NeuOpt), a novel learning-to-search (L2S) solver for routing problems. It learns to perform flexible k-opt exchanges based on a tailored action factorization method and a customized recurrent dual-stream decoder. As a pioneering work to circumvent the pure feasibility masking scheme and enable the autonomous exploration of both feasible and infeasible regions, we then propose the Guided Infeasible Region Exploration (GIRE) scheme, which supplements the NeuOpt policy network with feasibility-related features and leverages reward shaping to steer reinforcement learning more effectively. Additionally, we equip NeuOpt with Dynamic Data Augmentation (D2A) for more diverse searches during inference. Extensive experiments on the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) demonstrate that our NeuOpt not only significantly outstrips existing (masking-based) L2S solvers, but also showcases superiority over the learning-to-construct (L2C) and learning-to-predict (L2P) solvers. Notably, we offer fresh perspectives on how neural solvers can handle VRP constraints. Our code is available: https://github.com/yining043/NeuOpt.
翻译:本文提出神经k-opt(NeuOpt),一种新颖的路由问题学习搜索求解器。该求解器通过定制化的动作分解方法和定制的循环双流解码器,学习执行灵活的k-opt交换。作为突破纯可行性掩码方案、实现可行与不可行区域自主探索的开创性工作,我们进一步提出引导式不可行区域探索(GIRE)方案,该方案为NeuOpt策略网络补充可行性相关特征,并利用奖励塑造更有效地引导强化学习。此外,我们为NeuOpt配备动态数据增强(D2A)技术以在推理阶段实现更多样化的搜索。在旅行商问题(TSP)和带容量约束车辆路径问题(CVRP)上的大量实验表明,NeuOpt不仅显著超越现有基于掩码的学习搜索求解器,还展现出优于学习构建(L2C)和学习预测(L2P)求解器的性能。值得注意的是,我们为神经求解器处理VRP约束提供了全新视角。代码地址:https://github.com/yining043/NeuOpt。