As a challenging multi-player card game, DouDizhu has recently drawn much attention for analyzing competition and collaboration in imperfect-information games. In this paper, we propose PerfectDou, a state-of-the-art DouDizhu AI system that dominates the game, in an actor-critic framework with a proposed technique named perfect information distillation. In detail, we adopt a perfect-training-imperfect-execution framework that allows the agents to utilize the global information to guide the training of the policies as if it is a perfect information game and the trained policies can be used to play the imperfect information game during the actual gameplay. To this end, we characterize card and game features for DouDizhu to represent the perfect and imperfect information. To train our system, we adopt proximal policy optimization with generalized advantage estimation in a parallel training paradigm. In experiments we show how and why PerfectDou beats all existing AI programs, and achieves state-of-the-art performance.
翻译:作为一款具有挑战性的多人卡牌游戏,斗地主近年来在分析不完美信息游戏中的竞争与协作方面备受关注。本文提出PerfectDou——一个基于演员-评论家框架的先进斗地主AI系统,通过引入名为"完美信息蒸馏"的技术实现对游戏的绝对主导。具体而言,我们采用"完美训练-不完美执行"框架,使智能体能够像在完美信息游戏中一样利用全局信息指导策略训练,而训练后的策略可在实际对局中用于不完美信息游戏。为此,我们针对斗地主设计了牌型与游戏特征,用以表征完美信息与不完美信息。在训练过程中,我们采用并行训练范式,结合广义优势估计的近似策略优化算法。实验结果表明,PerfectDou不仅击败了所有现有AI程序,更达到了业界领先水平,本文同时揭示了其性能优势的成因。