While Nash equilibrium has emerged as the central game-theoretic solution concept, many important games contain several Nash equilibria and we must determine how to select between them in order to create real strategic agents. Several Nash equilibrium refinement concepts have been proposed and studied for sequential imperfect-information games, the most prominent being trembling-hand perfect equilibrium, quasi-perfect equilibrium, and recently one-sided quasi-perfect equilibrium. These concepts are robust to certain arbitrarily small mistakes, and are guaranteed to always exist; however, we argue that neither of these is the correct concept for developing strong agents in sequential games of imperfect information. We define a new equilibrium refinement concept for extensive-form games called observable perfect equilibrium in which the solution is robust over trembles in publicly-observable action probabilities (not necessarily over all action probabilities that may not be observable by opposing players). Observable perfect equilibrium correctly captures the assumption that the opponent is playing as rationally as possible given mistakes that have been observed (while previous solution concepts do not). We prove that observable perfect equilibrium is always guaranteed to exist, and demonstrate that it leads to a different solution than the prior extensive-form refinements in no-limit poker. We expect observable perfect equilibrium to be a useful equilibrium refinement concept for modeling many important imperfect-information games of interest in artificial intelligence.
翻译:尽管纳什均衡已成为博弈论中最核心的解概念,但许多重要博弈包含多个纳什均衡,我们必须确定如何在它们之间进行选择,以构建真实的策略智能体。针对序贯不完美信息博弈,已有若干纳什均衡精炼概念被提出和研究,其中最著名的是颤抖手完美均衡、拟完美均衡以及近来提出的单边拟完美均衡。这些概念对特定任意小的错误具有鲁棒性,且保证总是存在;然而,我们认为在序贯不完美信息博弈中,这些概念均非开发强智能体的正确概念。我们为扩展式博弈定义了一种新的均衡精炼概念——可观测完美均衡,该解对公开可观测动作概率(而非可能不被对手观测到的所有动作概率)中的颤抖具有鲁棒性。可观测完美均衡正确捕捉了这样一个假设:在观察到错误的前提下,对手会尽可能理性地行动(而先前的解概念未能做到)。我们证明了可观测完美均衡总是存在的,并展示了在无限注扑克中,它会导致不同于先前扩展式博弈精炼的解。我们预期可观测完美均衡将成为人工智能领域中建模许多重要不完美信息博弈的有用均衡精炼概念。