The extensive-form game has been studied considerably in recent years. It can represent games with multiple decision points and incomplete information, and hence it is helpful in formulating games with uncertain inputs, such as poker. We consider an extended-form game with two players and zero-sum, i.e., the sum of their payoffs is always zero. In such games, the problem of finding the optimal strategy can be formulated as a bilinear saddle-point problem. This formulation grows huge depending on the size of the game, since it has variables representing the strategies at all decision points for each player. To solve such large-scale bilinear saddle-point problems, the excessive gap technique (EGT), a smoothing method, has been studied. This method generates a sequence of approximate solutions whose error is guaranteed to converge at $\mathcal{O}(1/k)$, where $k$ is the number of iterations. However, it has the disadvantage of having poor theoretical bounds on the error related to the game size. This makes it inapplicable to large games. Our goal is to improve the smoothing method for solving extensive-form games so that it can be applied to large-scale games. To this end, we make two contributions in this work. First, we slightly modify the strongly convex function used in the smoothing method in order to improve the theoretical bounds related to the game size. Second, we propose a heuristic called centering trick, which allows the smoothing method to be combined with other methods and consequently accelerates the convergence in practice. As a result, we combine EGT with CFR+, a state-of-the-art method for extensive-form games, to achieve good performance in games where conventional smoothing methods do not perform well. The proposed smoothing method is shown to have the potential to solve large games in practice.
翻译:近年来,扩展形式博弈(extensive-form game)得到了广泛研究。这类博弈能够刻画包含多个决策点和不完全信息的场景,因此有助于对具有不确定输入的博弈(如扑克)进行建模。本文考虑双人零和扩展形式博弈(即两位玩家的收益之和始终为零)。在此类博弈中,最优策略的求解可转化为双线性鞍点问题。然而,由于该问题涉及每个玩家的所有决策点对应的策略变量,其规模会随着博弈的复杂度剧增。为求解此类大规模双线性鞍点问题,学者们提出了过度间隙技术(Excessive Gap Technique, EGT)这一平滑方法。该方法能生成一系列近似解,其误差以 $\mathcal{O}(1/k)$ 的速度收敛(其中 $k$ 为迭代次数)。但该方法存在一个缺陷:误差的理论界与博弈规模的关联性较差,导致其难以应用于大规模博弈。我们的目标是通过改进求解扩展形式博弈的平滑方法,使其能适用于大规模场景。为此,本文做出两项贡献:首先,我们微调了平滑方法中使用的强凸函数,以改善其与博弈规模相关的理论误差界;其次,我们提出一种名为“中心化技巧”(centering trick)的启发式方法,使得平滑方法可与其他方法结合,从而在实际中加速收敛。最终,我们将EGT与当前最优的扩展形式博弈求解方法CFR+相结合,在传统平滑方法表现不佳的博弈中取得了优异性能。实验表明,改进后的平滑方法具有求解大规模实际博弈的潜力。