This paper introduces a reinforcement learning approach to optimize the Stochastic Vehicle Routing Problem with Time Windows (SVRP), focusing on reducing travel costs in goods delivery. We develop a novel SVRP formulation that accounts for uncertain travel costs and demands, alongside specific customer time windows. An attention-based neural network trained through reinforcement learning is employed to minimize routing costs. Our approach addresses a gap in SVRP research, which traditionally relies on heuristic methods, by leveraging machine learning. The model outperforms the Ant-Colony Optimization algorithm, achieving a 1.73% reduction in travel costs. It uniquely integrates external information, demonstrating robustness in diverse environments, making it a valuable benchmark for future SVRP studies and industry application.
翻译:本文提出了一种强化学习方法,用于优化带时间窗的随机车辆路径问题(SVRP),重点降低货物配送中的运输成本。我们构建了一种新型SVRP模型,该模型考虑了不确定的运输成本与需求,同时兼顾客户特定的时间窗约束。采用基于注意力机制的神经网络,通过强化学习训练以实现路径成本最小化。该方法通过引入机器学习,弥补了传统依赖启发式算法的SVRP研究空白。模型性能优于蚁群优化算法,运输成本降低1.73%。其独特之处在于可整合外部信息,展现出在多环境下的鲁棒性,为未来SVRP研究及行业应用提供了有价值的基准。