Real-world games, which concern imperfect information, multiple players, and simultaneous moves, are less frequently discussed in the existing literature of game theory. While reinforcement learning (RL) provides a general framework to extend the game theoretical algorithms, the assumptions that guarantee their convergence towards Nash equilibria may no longer hold in real-world games. Starting from the definition of the Nash distribution, we construct a continuous-time dynamic named imperfect-information exponential-decay score-based learning (IESL) to find approximate Nash equilibria in games with the above-mentioned features. Theoretical analysis demonstrates that IESL yields equilibrium-approaching policies in imperfect information simultaneous games with the basic assumption of concavity. Experimental results show that IESL manages to find approximate Nash equilibria in four canonical poker scenarios and significantly outperforms three other representative algorithms in 3-player Leduc poker, manifesting its equilibrium-finding ability even in practical sequential games. Furthermore, related to the concept of game hypomonotonicity, a trade-off between the convergence of the IESL dynamic and the ultimate NashConv of the convergent policies is observed from the perspectives of both theory and experiment.
翻译:现实世界中的博弈通常涉及不完全信息、多玩家以及同时行动,这在现有博弈论文献中较少被讨论。虽然强化学习为扩展博弈论算法提供了通用框架,但保证其收敛至纳什均衡的假设在现实博弈中可能不再成立。从纳什分布的定义出发,我们构建了一种名为不完全信息指数衰减分数学习(IESL)的连续时间动态过程,用于在具有上述特征的博弈中寻找近似纳什均衡。理论分析表明,在凹性这一基本假设下,IESL在不完全信息同时行动博弈中能产生趋近均衡的策略。实验结果显示,在四种经典扑克场景中,IESL成功找到近似纳什均衡,并在三人Leduc扑克中显著优于其他三种代表性算法,展现了其在实际序列博弈中的均衡发现能力。此外,结合博弈次单调性概念,我们从理论和实验两个角度观察到IESL动态的收敛性与收敛策略最终纳什均衡差异之间的权衡关系。