When learning to play an imperfect information game, it is often easier to first start with the basic mechanics of the game rules. For example, one can play several example rounds with private cards revealed to all players to better understand the basic actions and their effects. Building on this intuition, this paper introduces {\it progressive hiding}, an algorithm that learns to play imperfect information games by first learning the basic mechanics and then progressively adding information constraints over time. Progressive hiding is inspired by methods from stochastic multistage optimization such as scenario decomposition and progressive hedging. We prove that it enables the adaptation of counterfactual regret minimization to games where perfect recall is not satisfied. Numerical experiments illustrate that progressive hiding can achieve optimal payoff in a benchmark of emergent communication trading game.
翻译:在学习不完全信息博弈时,通常更容易从游戏规则的基本机制入手。例如,玩家可以先进行若干回合的示例对局,其中所有玩家的私有卡牌均被公开,以便更好地理解基本行动及其效果。基于这一直观想法,本文提出了一种名为“渐进隐藏”的算法,该算法通过先学习游戏的基本机制,然后随时间逐步增加信息约束来学习不完全信息博弈。渐进隐藏的灵感来源于随机多阶段优化方法,如场景分解和渐进对冲。我们证明,该算法能够使反事实遗憾最小化方法适用于不满足完美回忆条件的博弈。数值实验表明,在涌现通信交易博弈的基准测试中,渐进隐藏能够实现最优收益。