The application of learning based methods to vehicle routing problems has emerged as a pivotal area of research in combinatorial optimization. These problems are characterized by vast solution spaces and intricate constraints, making traditional approaches such as exact mathematical models or heuristic methods prone to high computational overhead or reliant on the design of complex heuristic operators to achieve optimal or near optimal solutions. Meanwhile, although some recent learning-based methods can produce good performance for VRP with straightforward constraint scenarios, they often fail to effectively handle hard constraints that are common in practice. This study introduces a novel end-to-end framework that combines constraint-oriented hypergraphs with reinforcement learning to address vehicle routing problems. A central innovation of this work is the development of a constraint-oriented dynamic hyperedge reconstruction strategy within an encoder, which significantly enhances hypergraph representation learning. Additionally, the decoder leverages a double-pointer attention mechanism to iteratively generate solutions. The proposed model is trained by incorporating asynchronous parameter updates informed by hypergraph constraints and optimizing a dual loss function comprising constraint loss and policy gradient loss. The experiment results on benchmark datasets demonstrate that the proposed approach not only eliminates the need for sophisticated heuristic operators but also achieves substantial improvements in solution quality.
翻译:基于学习的方法在车辆路径问题中的应用已成为组合优化领域的关键研究方向。这些问题具有解空间巨大和约束复杂的特点,使得传统方法(如精确数学模型或启发式方法)往往面临高昂的计算开销,或依赖复杂启发式算子的设计才能获得最优或近似最优解。同时,尽管近期一些基于学习的方法能在约束场景简单的车辆路径问题上取得良好性能,但它们通常难以有效处理实际中常见的硬约束。本研究提出了一种新颖的端到端框架,将面向约束的超图与强化学习相结合,以求解车辆路径问题。本工作的一个核心创新是在编码器中开发了一种面向约束的动态超边重构策略,该策略显著增强了超图表征学习能力。此外,解码器采用双指针注意力机制迭代生成解决方案。所提出的模型通过结合基于超图约束的异步参数更新,并优化由约束损失和策略梯度损失组成的双重损失函数进行训练。在基准数据集上的实验结果表明,所提出的方法不仅无需复杂的启发式算子,而且在解的质量上取得了显著提升。