In this paper, we study the vehicle routing problem with a finite time horizon. In this routing problem, the objective is to maximize the number of customer requests served within a finite time horizon. We present a novel routing network embedding module which creates local node embedding vectors and a context-aware global graph representation. The proposed Markov decision process for the vehicle routing problem incorporates the node features, the network adjacency matrix and the edge features as components of the state space. We incorporate the remaining finite time horizon into the network embedding module to provide a proper routing context to the embedding module. We integrate our embedding module with a policy gradient-based deep Reinforcement Learning framework to solve the vehicle routing problem with finite time horizon. We trained and validated our proposed routing method on real-world routing networks, as well as synthetically generated Euclidean networks. Our experimental results show that our method achieves a higher customer service rate than the existing routing methods. Additionally, the solution time of our method is significantly lower than that of the existing methods.
翻译:本文研究有限时间范围内的车辆路径规划问题。在该路径规划问题中,目标是在有限时间范围内最大化已服务的客户请求数量。我们提出了一种新颖的路由网络嵌入模块,该模块可生成局部节点嵌入向量和上下文感知的全局图表示。针对车辆路径规划问题提出的马尔可夫决策过程,将节点特征、网络邻接矩阵和边特征作为状态空间的组成部分。我们将剩余有限时间范围纳入网络嵌入模块,为嵌入模块提供适当的路径规划上下文。我们将所提出的嵌入模块与基于策略梯度的深度强化学习框架相结合,以求解有限时间范围的车辆路径规划问题。我们在真实世界路由网络以及人工生成的欧几里得网络上对所提出的路由方法进行了训练和验证。实验结果表明,与现有路由方法相比,我们的方法实现了更高的客户服务率。此外,我们方法的求解时间显著低于现有方法。