Reinforcement learning has been applied in operation research and has shown promise in solving large combinatorial optimization problems. However, existing works focus on developing neural network architectures for certain problems. These works lack the flexibility to incorporate recent advances in reinforcement learning, as well as the flexibility of customizing model architectures for operation research problems. In this work, we analyze the end-to-end autoregressive models for vehicle routing problems and show that these models can benefit from the recent advances in reinforcement learning with a careful re-implementation of the model architecture. In particular, we re-implemented the Attention Model and trained it with Proximal Policy Optimization (PPO) in CleanRL, showing at least 8 times speed up in training time. We hereby introduce RLOR, a flexible framework for Deep Reinforcement Learning for Operation Research. We believe that a flexible framework is key to developing deep reinforcement learning models for operation research problems. The code of our work is publicly available at https://github.com/cpwan/RLOR.
翻译:强化学习已应用于运筹学领域,并在解决大规模组合优化问题中展现出潜力。然而,现有工作主要集中于针对特定问题设计神经网络架构,这些方法缺乏整合强化学习领域最新进展的灵活性,也难以根据运筹学问题定制模型架构。本研究分析了用于车辆路径问题的端到端自回归模型,并证明通过精细重实现模型架构,此类模型能够受益于强化学习的最新进展。具体而言,我们重实现了注意力模型(Attention Model),并在CleanRL框架中采用近端策略优化(PPO)进行训练,使训练速度提升至少8倍。基于此,我们提出RLOR——一个面向运筹学的深度强化学习灵活框架。我们认为,灵活的框架是开发运筹学问题深度强化学习模型的关键。本研究的代码已开源于https://github.com/cpwan/RLOR。