Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information games, and approaches using game-theoretic reasoning and learning demonstrated strong performance for specific imperfect information poker variants. We introduce Student of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Student of Games achieves strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Student of Games is sound, converging to perfect play as available computation and approximation capacity increases. Student of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker, and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.
翻译:游戏长期以来作为人工智能进步的基准。结合搜索与学习的方法在众多完美信息游戏中取得了强劲表现,而基于博弈论推理与学习的方法则在特定非完美信息扑克变体中展现出卓越性能。我们提出"游戏的开端"(Student of Games)——一种统一了先前方法的通用算法,融合了引导搜索、自我对弈学习与博弈论推理。该算法在大型完美与非完美信息游戏中取得了强劲的实证表现,这是迈向真正通用算法以应对任意环境的重要一步。我们证明"游戏的开端"具有严谨性,随着可用计算与近似能力的提升,其能收敛至完美玩法。该算法在象棋与围棋中达到强劲水平,在单挑无限注德州扑克中击败最强公开可用智能体,并在展现引导搜索、学习与博弈论推理价值的非完美信息游戏《苏格兰场》中击败了最先进的智能体。