Combinatorial Optimization is crucial to numerous real-world applications, yet still presents challenges due to its (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heuristics remains incomplete within industrial solvers. Existing learned methods still lack the ability to adapt to specific instances and fully leverage the available computational budget. The current best methods either rely on a collection of pre-trained policies, or on data-inefficient fine-tuning; hence failing to fully utilize newly available information within the constraints of the budget. In response, we present MEMENTO, an RL approach that leverages memory to improve the adaptation of neural solvers at inference time. MEMENTO enables updating the action distribution dynamically based on the outcome of previous decisions. We validate its effectiveness on benchmark problems, in particular Traveling Salesman and Capacitated Vehicle Routing, demonstrating it can successfully be combined with standard methods to boost their performance under a given budget, both in and out-of-distribution, improving their performance on all 12 evaluated tasks.
翻译:组合优化对众多现实应用至关重要,但由于其(NP)难解性仍面临挑战。在现有方法中,启发式算法通常在求解质量与可扩展性之间提供最佳权衡,因而适用于工业场景。虽然强化学习为启发式算法设计提供了灵活框架,但在工业求解器中仍未完全替代人工设计的启发式方法。现有学习方法仍缺乏适应特定问题实例及充分利用可用计算资源的能力。当前最优方法要么依赖预训练策略集合,要么采用数据低效的微调策略,因而无法在计算资源约束下充分利用新获得的信息。为此,我们提出MEMENTO——一种利用记忆机制在推理阶段增强神经求解器适应能力的强化学习方法。MEMENTO能够根据先前决策结果动态更新动作分布。我们在基准问题(特别是旅行商问题和容量约束车辆路径问题)上验证了其有效性,证明该方法可与标准方法结合,在给定计算资源条件下提升其性能(包括分布内与分布外场景),在所有12个评估任务中均实现了性能提升。