While Nash equilibrium has emerged as the central game-theoretic solution concept, many important games contain several Nash equilibria and we must determine how to select between them in order to create real strategic agents. Several Nash equilibrium refinement concepts have been proposed and studied for sequential imperfect-information games, the most prominent being trembling-hand perfect equilibrium, quasi-perfect equilibrium, and recently one-sided quasi-perfect equilibrium. These concepts are robust to certain arbitrarily small mistakes, and are guaranteed to always exist; however, we argue that neither of these is the correct concept for developing strong agents in sequential games of imperfect information. We define a new equilibrium refinement concept for extensive-form games called observable perfect equilibrium in which the solution is robust over trembles in publicly-observable action probabilities (not necessarily over all action probabilities that may not be observable by opposing players). Observable perfect equilibrium correctly captures the assumption that the opponent is playing as rationally as possible given mistakes that have been observed (while previous solution concepts do not). We prove that observable perfect equilibrium is always guaranteed to exist, and demonstrate that it leads to a different solution than the prior extensive-form refinements in no-limit poker. We expect observable perfect equilibrium to be a useful equilibrium refinement concept for modeling many important imperfect-information games of interest in artificial intelligence.
翻译:尽管纳什均衡已成为博弈论的核心解概念,但许多重要博弈包含多个纳什均衡,我们需要确定如何在这些均衡中进行选择以创建真实的智能体。针对序列不完美信息博弈,已有多种纳什均衡精炼概念被提出并研究,其中最著名的是颤抖手完美均衡、拟完美均衡以及近期提出的单侧拟完美均衡。这些概念对某些任意小的错误具有稳健性,且保证总是存在;然而,我们认为这些概念中没有一个是在不完美信息序列博弈中开发强智能体的正确概念。我们定义了一种新的扩展式博弈均衡精炼概念,称为可观测完美均衡,该解在公开可观测动作概率的颤抖下(而非可能不被对手玩家观测到的所有动作概率)具有稳健性。可观测完美均衡正确捕捉了这样一个假设:在观测到的错误前提下(而先前的解概念无法做到),对手会尽可能理性地行动。我们证明了可观测完美均衡总是存在的,并在无限注扑克中证明它会导致与先前扩展式博弈精炼不同的解。我们预期可观测完美均衡将成为对人工智能中许多重要的不完美信息博弈进行建模的有用均衡精炼概念。