Due to their complex dynamics, combinatorial games are a key test case and application for algorithms that train game playing agents. Among those algorithms that train using self-play are coevolutionary algorithms (CoEAs). CoEAs evolve a population of individuals by iteratively selecting the strongest based on their interactions against contemporaries, and using those selected as parents for the following generation (via randomised mutation and crossover). However, the successful application of CoEAs for game playing is difficult due to pathological behaviours such as cycling, an issue especially critical for games with intransitive payoff landscapes. Insight into how to design CoEAs to avoid such behaviours can be provided by runtime analysis. In this paper, we push the scope of runtime analysis to combinatorial games, proving a general upper bound for the number of simulated games needed for UMDA (a type of CoEA) to discover (with high probability) an optimal strategy for an impartial combinatorial game. This result applies to any impartial combinatorial game, and for many games the implied bound is polynomial or quasipolynomial as a function of the number of game positions. After proving the main result, we provide several applications to simple well-known games: Nim, Chomp, Silver Dollar, and Turning Turtles. As the first runtime analysis for CoEAs on combinatorial games, this result is a critical step towards a comprehensive theoretical framework for coevolution.
翻译:由于其复杂的动态特性,组合博弈游戏是训练游戏智能体算法的关键测试案例和应用场景。在通过自我对弈进行训练的算法中,协同进化算法(CoEAs)是重要的一类。CoEAs通过迭代选择在与当代个体交互中表现最强的个体,并将这些被选个体作为下一代的亲本(通过随机变异和交叉操作)来演化种群。然而,由于循环等病态行为的存在,CoEAs在游戏对弈中的成功应用较为困难,这一问题在具有非传递性收益格局的游戏中尤为关键。运行时间分析能够为如何设计CoEAs以避免此类行为提供理论依据。本文首次将运行时间分析的研究范围拓展至组合博弈游戏,证明了UMDA(一类CoEA)以高概率发现中立组合博弈游戏最优策略所需模拟游戏次数的一般上界。该结果适用于所有中立组合博弈游戏,且对多数游戏而言,所得上界关于游戏位置数量呈多项式或拟多项式增长。在证明主要结果后,我们进一步将其应用于若干经典简单游戏:Nim、Chomp、Silver Dollar和Turning Turtles。作为组合博弈游戏中CoEAs的首个运行时间分析,本研究成果是构建完整协同进化理论框架的关键进展。