Deep reinforcement learning (RL) has been shown to be effective in producing approximate solutions to some vehicle routing problems (VRPs), especially when using policies generated by encoder-decoder attention mechanisms. While these techniques have been quite successful for relatively simple problem instances, there are still under-researched and highly complex VRP variants for which no effective RL method has been demonstrated. In this work we focus on one such VRP variant, which contains multiple trucks and multi-leg routing requirements. In these problems, demand is required to move along sequences of nodes, instead of just from a start node to an end node. With the goal of making deep RL a viable strategy for real-world industrial-scale supply chain logistics, we develop new extensions to existing encoder-decoder attention models which allow them to handle multiple trucks and multi-leg routing requirements. Our models have the advantage that they can be trained for a small number of trucks and nodes, and then embedded into a large supply chain to yield solutions for larger numbers of trucks and nodes. We test our approach on a real supply chain environment arising in the operations of Japanese automotive parts manufacturer Aisin Corporation, and find that our algorithm outperforms Aisin's previous best solution.
翻译:深度强化学习已被证明能有效为某些车辆路径问题提供近似解,尤其是在使用基于编码器-解码器注意力机制生成的策略时。尽管这些技术已在相对简单的问题实例中取得显著成功,但仍存在研究不足且高度复杂的VRP变体,尚未有有效的强化学习方法得到验证。本研究聚焦于一种包含多辆卡车和多段路径需求的VRP变体。在此类问题中,需求需沿节点序列移动,而非仅从起点节点到终点节点。为使深度强化学习成为现实工业级供应链物流的可行策略,我们对现有编码器-解码器注意力模型进行了新扩展,使其能够处理多辆卡车和多段路径需求。我们的模型具备以下优势:可在少量卡车和节点规模下进行训练,随后嵌入大规模供应链中,为更多卡车和节点生成解决方案。我们在日本汽车零部件制造商爱信公司实际运营的供应链环境中测试了该方法,结果表明我们的算法优于爱信公司先前的最佳解决方案。