We introduce RL4CO, an extensive reinforcement learning (RL) for combinatorial optimization (CO) benchmark. RL4CO employs state-of-the-art software libraries as well as best practices in implementation, such as modularity and configuration management, to be efficient and easily modifiable by researchers for adaptations of neural network architecture, environments, and algorithms. Contrary to the existing focus on specific tasks like the traveling salesman problem (TSP) for performance assessment, we underline the importance of scalability and generalization capabilities for diverse optimization tasks. We also systematically benchmark sample efficiency, zero-shot generalization, and adaptability to changes in data distributions of various models. Our experiments show that some recent state-of-the-art methods fall behind their predecessors when evaluated using these new metrics, suggesting the necessity for a more balanced view of the performance of neural CO solvers. We hope RL4CO will encourage the exploration of novel solutions to complex real-world tasks, allowing to compare with existing methods through a standardized interface that decouples the science from the software engineering. We make our library publicly available at https://github.com/kaist-silab/rl4co.
翻译:我们提出RL4CO,这是一个针对组合优化(CO)的强化学习(RL)综合基准。RL4CO采用最先进的软件库以及最佳实践(如模块化与配置管理),旨在实现高效运行并便于研究者修改神经网络架构、环境及算法。与现有研究聚焦于旅行商问题(TSP)等特定任务的性能评估不同,我们强调面向多样化优化任务的可扩展性与泛化能力的重要性。我们还系统性地对多种模型的样本效率、零样本泛化能力以及应对数据分布变化的适应性进行了基准测试。实验表明,部分近年提出的最优方法在新评估指标下表现不及前期方法,这提示需以更平衡的视角评估神经组合优化求解器的性能。我们期望RL4CO能推动对复杂现实任务新方案的探索,并借助标准化接口(该接口将科学问题与软件工程解耦)实现与现有方法的对比。本库现已开源:https://github.com/kaist-silab/rl4co。