Bayesian games model interactive decision-making where players have incomplete information -- e.g., regarding payoffs and private data on players' strategies and preferences -- and must actively reason and update their belief models (with regard to such information) using observation and interaction history. Existing work on counterfactual regret minimization have shown great success for games with complete or imperfect information, but not for Bayesian games. To this end, we introduced a new CFR algorithm: Bayesian-CFR and analyze its regret bound with respect to Bayesian Nash Equilibria in Bayesian games. First, we present a method for updating the posterior distribution of beliefs about the game and other players' types. The method uses a kernel-density estimate and is shown to converge to the true distribution. Second, we define Bayesian regret and present a Bayesian-CFR minimization algorithm for computing the Bayesian Nash equilibrium. Finally, we extend this new approach to other existing algorithms, such as Bayesian-CFR+ and Deep Bayesian CFR. Experimental results show that our proposed solutions significantly outperform existing methods in classical Texas Hold'em games.
翻译:贝叶斯博弈对具有不完全信息的交互式决策过程进行建模——例如,关于收益以及玩家策略与偏好的私有数据——玩家必须通过观察和交互历史主动推理并更新其信念模型(针对此类信息)。现有关于反事实遗憾最小化的研究在完全信息或不完美信息博弈中已取得显著成功,但尚未扩展至贝叶斯博弈。为此,我们提出了一种新的CFR算法:贝叶斯CFR,并分析了其在贝叶斯博弈中相对于贝叶斯纳什均衡的遗憾界。首先,我们提出了一种更新关于博弈与其他玩家类型的信念后验分布的方法。该方法采用核密度估计,并被证明能收敛至真实分布。其次,我们定义了贝叶斯遗憾,并提出一种用于计算贝叶斯纳什均衡的贝叶斯CFR最小化算法。最后,我们将这一新方法扩展至其他现有算法,如贝叶斯CFR+与深度贝叶斯CFR。实验结果表明,我们提出的解决方案在经典德州扑克博弈中显著优于现有方法。