Applying machine learning to combinatorial optimization problems has the potential to improve both efficiency and accuracy. However, existing learning-based solvers often struggle with generalization when faced with changes in problem distributions and scales. In this paper, we propose a new approach called ASP: Adaptive Staircase Policy Space Response Oracle to address these generalization issues and learn a universal neural solver. ASP consists of two components: Distributional Exploration, which enhances the solver's ability to handle unknown distributions using Policy Space Response Oracles, and Persistent Scale Adaption, which improves scalability through curriculum learning. We have tested ASP on several challenging COPs, including the traveling salesman problem, the vehicle routing problem, and the prize collecting TSP, as well as the real-world instances from TSPLib and CVRPLib. Our results show that even with the same model size and weak training signal, ASP can help neural solvers explore and adapt to unseen distributions and varying scales, achieving superior performance. In particular, compared with the same neural solvers under a standard training pipeline, ASP produces a remarkable decrease in terms of the optimality gap with 90.9% and 47.43% on generated instances and real-world instances for TSP, and a decrease of 19% and 45.57% for CVRP.
翻译:将机器学习应用于组合优化问题有望提升效率和精度。然而,现有的基于学习的求解器在面对问题分布和规模变化时,往往难以实现泛化。本文提出一种新方法——ASP:自适应阶梯策略空间响应预言(Adaptive Staircase Policy Space Response Oracle),以解决这些泛化问题并学习一个通用的神经求解器。ASP包含两个组件:分布探索(Distributional Exploration),通过策略空间响应预言增强求解器应对未知分布的能力;以及持久规模自适应(Persistent Scale Adaption),通过课程学习提高可扩展性。我们在多个具有挑战性的组合优化问题(COP)上测试了ASP,包括旅行商问题(TSP)、车辆路径问题(VRP)和奖励收集TSP,以及来自TSPLib和CVRPLib的真实世界实例。结果表明,即使模型大小相同且训练信号较弱,ASP也能帮助神经求解器探索并适应未见分布和不同规模,从而取得优越性能。特别地,与标准训练流程下的相同神经求解器相比,ASP在TSP的生成实例和真实世界实例上分别将最优性差距降低了90.9%和47.43%,在CVRP上分别降低了19%和45.57%。