We introduce RL4CO, an extensive reinforcement learning (RL) for combinatorial optimization (CO) benchmark. RL4CO employs state-of-the-art software libraries as well as best practices in implementation, such as modularity and configuration management, to be efficient and easily modifiable by researchers for adaptations of neural network architecture, environments, and RL algorithms. Contrary to the existing focus on specific tasks like the traveling salesman problem (TSP) for performance assessment, we underline the importance of scalability and generalization capabilities for diverse CO tasks. We also systematically benchmark zero-shot generalization, sample efficiency, and adaptability to changes in data distributions of various models. Our experiments show that some recent SOTA methods fall behind their predecessors when evaluated using these metrics, suggesting the necessity for a more balanced view of the performance of neural CO (NCO) solvers. We hope RL4CO will encourage the exploration of novel solutions to complex real-world tasks, allowing the NCO community to compare with existing methods through a standardized interface that decouples the science from software engineering. We make our library publicly available at https://github.com/kaist-silab/rl4co.
翻译:我们提出了RL4CO,一个面向组合优化(CO)的综合强化学习(RL)基准。RL4CO采用最先进的软件库及最佳实现实践(如模块化和配置管理),以确保效率并便于研究者对神经网络架构、环境和RL算法进行适应性修改。与现有侧重旅行商问题(TSP)等特定任务性能评估的做法不同,我们强调针对多样化CO任务的可扩展性与泛化能力的重要性。我们还系统性地对多种模型的零样本泛化、样本效率及数据分布变化的适应性进行了基准测试。实验表明,部分近期提出的SOTA方法在使用这些指标评估时性能不及先前方法,这揭示了需对神经组合优化(NCO)求解器的性能进行更全面评估的必要性。我们希望RL4CO能够激励针对复杂现实世界任务的新颖解决方案探索,为NCO社区提供标准化的接口,将科学问题与软件工程解耦,从而便于与现有方法进行对比。我们的代码库已公开于https://github.com/kaist-silab/rl4co。