学习快速求解车辆路径问题：一种针对有限车队时间约束车辆路径问题的神经优化方法 (Learn to Solve Vehicle Routing Problems ASAP: A Neural Optimization Approach for Time-Constrained Vehicle Routing Problems with Finite Vehicle Fleet)

from arxiv, Affiliation: German Aerospace Center (DLR), Institute of Transport Research, Rudower Chaussee 7, 12489 Berlin Correspondence: Elija.deineko@dlr.de

Finding a feasible and prompt solution to the Vehicle Routing Problem (VRP) is a prerequisite for efficient freight transportation, seamless logistics, and sustainable mobility. Traditional optimization methods reach their limits when confronted with the real-world complexity of VRPs, which involve numerous constraints and objectives. Recently, the ability of generative Artificial Intelligence (AI) to solve combinatorial tasks, known as Neural Combinatorial Optimization (NCO), demonstrated promising results, offering new perspectives. In this study, we propose an NCO approach to solve a time-constrained capacitated VRP with a finite vehicle fleet size. The approach is based on an encoder-decoder architecture, formulated in line with the Policy Optimization with Multiple Optima (POMO) protocol and trained via a Proximal Policy Optimization (PPO) algorithm. We successfully trained the policy with multiple objectives (minimizing the total distance while maximizing vehicle utilization) and evaluated it on medium and large instances, benchmarking it against state-of-the-art heuristics. The method is able to find adequate and cost-efficient solutions, showing both flexibility and robust generalization. Finally, we provide a critical analysis of the solution generated by NCO and discuss the challenges and opportunities of this new branch of intelligent learning algorithms emerging in optimization science, focusing on freight transportation.

翻译：为车辆路径问题（VRP）寻找可行且快速的解决方案，是实现高效货运、无缝物流和可持续交通的前提。面对涉及多重约束与目标的现实世界复杂VRP问题时，传统优化方法已显局限。近期，生成式人工智能（AI）在解决组合任务（即神经组合优化，NCO）方面展现出潜力，为这一领域提供了新视角。本研究提出一种NCO方法，用于求解具有有限车队规模的时间约束容量车辆路径问题。该方法基于编码器-解码器架构，依据多最优策略优化（POMO）框架构建，并通过近端策略优化（PPO）算法进行训练。我们成功训练了具有多目标（最小化总行驶距离同时最大化车辆利用率）的策略，并在中大规模算例上对其进行了评估，与先进启发式方法进行了基准比较。该方法能够找到充分且成本效益高的解决方案，展现出良好的灵活性与鲁棒泛化能力。最后，我们对NCO生成的解进行了批判性分析，并讨论了这一新兴于优化科学中的智能学习算法分支在货运领域面临的挑战与机遇。