In recent years, Monte Carlo tree search (MCTS) has achieved widespread adoption within the game community. Its use in conjunction with deep reinforcement learning has produced success stories in many applications. While these approaches have been implemented in various games, from simple board games to more complicated video games such as StarCraft, the use of deep neural networks requires a substantial training period. In this work, we explore on-line adaptivity in MCTS without requiring pre-training. We present MCTS-TD, an adaptive MCTS algorithm improved with temporal difference learning. We demonstrate our new approach on the game miniXCOM, a simplified version of XCOM, a popular commercial franchise consisting of several turn-based tactical games, and show how adaptivity in MCTS-TD allows for improved performances against opponents.
翻译:近年来,蒙特卡洛树搜索(MCTS)在游戏社区中得到了广泛应用。其与深度强化学习的结合在许多应用中取得了成功案例。尽管这些方法已从简单棋盘游戏到《星际争霸》等更复杂的视频游戏中得到实现,但深度神经网络的使用仍需大量的训练周期。本文探索了无需预训练即可实现MCTS在线自适应的方法。我们提出了MCTS-TD,一种通过时序差分学习改进的自适应MCTS算法。我们在游戏miniXCOM(一款简化版《XCOM》战术游戏,该系列包含多款回合制战术游戏且为知名商业IP)上验证了新方法,并展示了MCTS-TD的自适应特性如何提升对阵对手时的表现。