Monte Carlo Tree Search (MCTS) is a best-first sampling method employed in the search for optimal decisions. The effectiveness of MCTS relies on the construction of its statistical tree, with the selection policy playing a crucial role. A selection policy that works particularly well in MCTS is the Upper Confidence Bounds for Trees, referred to as UCT. The research community has also put forth more sophisticated bounds aimed at enhancing MCTS performance on specific problem domains. Thus, while MCTS UCT generally performs well, there may be variants that outperform it. This has led to various efforts to evolve selection policies for use in MCTS. While all of these previous works are inspiring, none have undertaken an in-depth analysis to shed light on the circumstances in which an evolved alternative to MCTS UCT might prove advantageous. Most of these studies have focused on a single type of problem. In sharp contrast, this work explores the use of five functions of different natures, ranging from unimodal to multimodal and deceptive functions. We illustrate how the evolution of MCTS UCT can yield benefits in multimodal and deceptive scenarios, whereas MCTS UCT is robust in all of the functions used in this work.
翻译:蒙特卡洛树搜索(MCTS)是一种用于搜索最优决策的最佳优先采样方法。其有效性依赖于统计树的构建,其中选择策略起着关键作用。在MCTS中,一种特别有效的选择策略是树的上限置信区间算法(UCT)。研究界还提出了更复杂的边界,旨在提升MCTS在特定问题领域中的性能。因此,尽管MCTS UCT通常表现良好,但可能存在超越其性能的变体。这促使了各种演化MCTS选择策略的工作。尽管这些先前的研究具有启发性,但尚无一项工作深入分析在何种情况下演化出的MCTS UCT替代方案可能具有优势。大多数研究集中于单一类型问题。与此形成鲜明对比,本文探索了五种不同性质的函数(从单峰到多峰及欺骗性函数)的使用。我们展示了MCTS UCT的演化如何在多峰和欺骗性场景中带来收益,而MCTS UCT在本研究使用的所有函数中均表现出稳健性。