In safe opponent exploitation players hope to exploit their opponents' potentially sub-optimal strategies while guaranteeing at least the value of the game in expectation for themselves. Safe opponent exploitation algorithms have been successfully applied to small instances of two-player zero-sum imperfect information games, where Nash equilibrium strategies are typically known in advance. Current methods available to compute these strategies are however not scalable to desirable large domains of imperfect information such as No-Limit Texas Hold 'em (NLHE) poker, where successful agents rely on game abstractions in order to compute an equilibrium strategy approximation. This paper will extend the concept of safe opponent exploitation by introducing prime-safe opponent exploitation, in which we redefine the value of the game of a player to be the worst-case payoff their strategy could be susceptible to. This allows weaker epsilon equilibrium strategies to benefit from utilising a form of opponent exploitation with our revised value of the game, still allowing for a practical game-theoretical guaranteed lower-bound. We demonstrate the empirical advantages of our generalisation when applied to the main safe opponent exploitation algorithms.
翻译:在安全对手利用中,玩家希望利用对手可能存在的次优策略,同时保证自身至少获得游戏的期望值。安全对手利用算法已成功应用于小型两人零和且信息不完全的游戏实例,在这些游戏中,纳什均衡策略通常事先已知。然而,目前可用的计算这些策略的方法无法扩展到所需的大规模不完全信息领域,例如无限注德州扑克,其中成功的智能体依赖于游戏抽象来近似计算均衡策略。本文通过引入主安全对手利用来扩展安全对手利用的概念,我们在此重新定义玩家的游戏值,即其策略可能受到的最坏情况收益。这使得较弱的epsilon均衡策略能够利用我们修订的游戏值进行某种形式的对手利用,同时仍然保持一个实用的博弈论保证下界。我们在主要安全对手利用算法上展示了我们这种泛化方法的实证优势。