A major challenge in decision making domains with large state spaces is to effectively select actions which maximize utility. In recent years, approaches such as reinforcement learning (RL) and search algorithms have been successful to tackle this issue, despite their differences. RL defines a learning framework that an agent explores and interacts with. Search algorithms provide a formalism to search for a solution. However, it is often difficult to evaluate the performances of such approaches in a practical way. Motivated by this problem, we focus on one game domain, i.e., Connect-4, and develop a novel evolutionary framework to evaluate three classes of algorithms: RL, Minimax and Monte Carlo tree search (MCTS). The contribution of this paper is threefold: i) we implement advanced versions of these algorithms and provide a systematic comparison with their standard counterpart, ii) we develop a novel evaluation framework, which we call the Evolutionary Tournament, and iii) we conduct an extensive evaluation of the relative performance of each algorithm to compare our findings. We evaluate different metrics and show that MCTS achieves the best results in terms of win percentage, whereas Minimax and Q-Learning are ranked in second and third place, respectively, although the latter is shown to be the fastest to make a decision.
翻译:在状态空间庞大的决策领域中,一个主要挑战是如何有效选择能最大化效用的行动。近年来,尽管存在差异,强化学习(RL)与搜索算法等方法已成功应对这一问题。RL定义了智能体探索与交互的学习框架,搜索算法则提供了寻找解决方案的形式化方法。然而,在实践中评估此类方法的性能往往较为困难。受此问题驱动,我们聚焦于Connect-4这一游戏领域,开发了一种新颖的演化框架,用于评估三类算法:强化学习、Minimax以及蒙特卡洛树搜索(MCTS)。本文的贡献有三方面:其一,我们实现了这些算法的高级版本,并与其标准版本进行了系统比较;其二,我们开发了一种称为“演化锦标赛”的新型评估框架;其三,我们对各算法的相对性能进行了广泛评估以比较研究结果。我们评估了不同指标,结果表明MCTS在胜率方面取得最佳效果,而Minimax与Q学习分别位列第二和第三,尽管后者被证明是决策速度最快的算法。