While Nash equilibrium has emerged as the central game-theoretic solution concept, many important games contain several Nash equilibria and we must determine how to select between them in order to create real strategic agents. Several Nash equilibrium refinement concepts have been proposed and studied for sequential imperfect-information games, the most prominent being trembling-hand perfect equilibrium, quasi-perfect equilibrium, and recently one-sided quasi-perfect equilibrium. These concepts are robust to certain arbitrarily small mistakes, and are guaranteed to always exist; however, we argue that neither of these is the correct concept for developing strong agents in sequential games of imperfect information. We define a new equilibrium refinement concept for extensive-form games called observable perfect equilibrium in which the solution is robust over trembles in publicly-observable action probabilities (not necessarily over all action probabilities that may not be observable by opposing players). Observable perfect equilibrium correctly captures the assumption that the opponent is playing as rationally as possible given mistakes that have been observed (while previous solution concepts do not). We prove that observable perfect equilibrium is always guaranteed to exist, and demonstrate that it leads to a different solution than the prior extensive-form refinements in no-limit poker. We expect observable perfect equilibrium to be a useful equilibrium refinement concept for modeling many important imperfect-information games of interest in artificial intelligence.
翻译:尽管纳什均衡已成为博弈论的核心解概念,但许多重要博弈包含多个纳什均衡,因此我们必须确定如何在这些均衡中进行选择,以构建真实的策略智能体。针对序列不完美信息博弈,已有多种纳什均衡的精炼概念被提出和研究,其中最著名的是颤抖手完美均衡、拟完美均衡以及近期提出的单边拟完美均衡。这些概念对某些任意小的错误具有鲁棒性,并保证总是存在;然而,我们认为这些概念都不是为不完美信息序列博弈中开发强智能体的正确概念。我们为扩展式博弈定义了一个新的均衡精炼概念,称为可观测完美均衡,在该概念中,解对公开可观测行动概率中的颤抖具有鲁棒性(无需对可能不被对手玩家观测到的所有行动概率都保持鲁棒)。可观测完美均衡正确捕捉了对手在观测到已发生的错误情况下尽可能理性行动的假设(而先前的解概念未能做到)。我们证明可观测完美均衡总是保证存在,并证明它在无限注扑克中会导致与先前扩展式精炼不同的解。我们期望可观测完美均衡能成为建模人工智能领域许多重要不完美信息博弈的有用均衡精炼概念。