Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for optimizing a sequence of choices. We propose to improve on GNRPA by avoiding too deterministic policies that find again and again the same sequence of choices. We do so by limiting the number of repetitions of the best sequence found at a given level. Experiments show that it improves the algorithm for three different combinatorial problems: Inverse RNA Folding, the Traveling Salesman Problem with Time Windows and the Weak Schur problem.
翻译:通用嵌套 Rollout 策略自适应(GNRPA)是一种用于优化选择序列的蒙特卡洛搜索算法。本文提出通过避免过度确定性的策略(这种策略会反复找到相同序列)来改进 GNRPA。具体做法是限制在给定层级上找到的最佳序列的重复次数。实验表明,该改进算法在三种不同的组合优化问题中均取得提升:RNA 逆折叠、带时间窗的旅行商问题以及弱 Schur 问题。