We introduce Limited Rollout Beam Search (LRBS), a beam search strategy for deep reinforcement learning (DRL) based combinatorial optimization improvement heuristics. Utilizing pre-trained models on the Euclidean Traveling Salesperson Problem, LRBS significantly enhances both in-distribution performance and generalization to larger problem instances, achieving optimality gaps that outperform existing improvement heuristics and narrowing the gap with state-of-the-art constructive methods. We also extend our analysis to two pickup and delivery TSP variants to validate our results. Finally, we employ our search strategy for offline and online adaptation of the pre-trained improvement policy, leading to improved search performance and surpassing recent adaptive methods for constructive heuristics.
翻译:本文提出了有限滚动波束搜索(LRBS),一种面向深度强化学习(DRL)组合优化改进启发式算法的波束搜索策略。通过在欧几里得旅行商问题上使用预训练模型,LRBS显著提升了分布内性能及向更大规模问题实例的泛化能力,所获得的最优性差距优于现有改进启发式方法,并缩小了与最先进构造性方法的差距。我们还将分析扩展至两种取货送货旅行商问题变体以验证结果。最后,我们利用该搜索策略对预训练改进策略进行离线和在线适应,从而提升了搜索性能,并超越了近期针对构造性启发式的自适应方法。