Monte Carlo Tree Search (MCTS) is a sampling best-first method to search for optimal decisions. The success of MCTS depends heavily on how the MCTS statistical tree is built and the selection policy plays a fundamental role in this. A particular selection policy that works particularly well, widely adopted in MCTS, is the Upper Confidence Bounds for Trees, referred to as UCT. Other more sophisticated bounds have been proposed by the community with the goal to improve MCTS performance on particular problems. Thus, it is evident that while the MCTS UCT behaves generally well, some variants might behave better. As a result of this, multiple works have been proposed to evolve a selection policy to be used in MCTS. Although all these works are inspiring, none of them have carried out an in-depth analysis shedding light under what circumstances an evolved alternative of MCTS UCT might be beneficial in MCTS due to focusing on a single type of problem. In sharp contrast to this, in this work we use five functions of different nature, going from a unimodal function, covering multimodal functions to deceptive functions. We demonstrate how the evolution of the MCTS UCT might be beneficial in multimodal and deceptive scenarios, whereas the MCTS UCT is robust in unimodal scenarios and competitive in the rest of the scenarios used in this study.
翻译:蒙特卡洛树搜索(MCTS)是一种用于搜索最优决策的采样最佳优先方法。MCTS的成功很大程度上取决于其统计树的构建方式,而选择策略在此过程中发挥着基础性作用。一种表现尤为出色且被MCTS广泛采用的选择策略是树的置信上界算法(UCT)。研究界还提出了其他更复杂的置信界,旨在提升MCTS在特定问题上的性能。由此可见,尽管MCTS UCT总体上表现良好,但某些变体可能具有更优性能。为此,已有大量工作致力于演化适用于MCTS的选择策略。虽然这些研究均具有启发性,但都因聚焦于单一类型问题而未能深入分析在何种情况下MCTS UCT的演化变体能够为MCTS带来益处。与此形成鲜明对比的是,本研究采用了五种不同性质的函数——涵盖单峰函数、多峰函数直至欺骗性函数。我们证明了MCTS UCT的演化在多峰和欺骗性场景中具有优势,而MCTS UCT在单峰场景中表现稳健,在本研究所涉及的其他场景中则展现出竞争力。