In this article, we focus on search algorithms for two-player perfect information games, whose objective is to determine the best possible strategy, and ideally a winning strategy. Unfortunately, some search algorithms for games in the literature are not able to always determine a winning strategy, even with an infinite search time. This is the case, for example, of the following algorithms: Unbounded Best-First Minimax and Descent Minimax, which are core algorithms in state-of-the-art knowledge-free reinforcement learning. They were then improved with the so-called completion technique. However, whether this technique sufficiently improves these algorithms to allow them to always determine a winning strategy remained an open question until now. To answer this question, we generalize the two algorithms (their versions using the completion technique), and we show that any algorithm of this class of algorithms computes the best strategy. Finally, we experimentally show that the completion technique improves winning performance.
翻译:本文聚焦于双人完美信息博弈的搜索算法,其目标是确定最优策略,理想情况下为获胜策略。然而,文献中部分博弈搜索算法即使具备无限搜索时间,也无法始终确定获胜策略。例如,以下算法即属此类:无界最佳优先极小化极大与下降极小化极大——它们是当前无知识强化学习领域核心算法。随后,研究者通过所谓的完备化技术对其进行改进。但该技术是否足以使这些算法始终能够确定获胜策略,此前始终悬而未决。为解答此问题,我们对这两种算法(采用完备化技术的版本)进行泛化,证明此类算法中的任意一种均能计算最优策略。最后,通过实验证明完备化技术可提升获胜性能。