In repeated games, strategies are often evaluated by their ability to guarantee the performance of the single best action that is selected in hindsight, a property referred to as \emph{Hannan consistency}, or \emph{no-regret}. However, the effectiveness of the single best action as a yardstick to evaluate strategies is limited, as any static action may perform poorly in common dynamic settings. Our work therefore turns to a more ambitious notion of \emph{dynamic benchmark consistency}, which guarantees the performance of the best \emph{dynamic} sequence of actions, selected in hindsight subject to a constraint on the allowable number of action changes. Our main result establishes that for any joint empirical distribution of play that may arise when all players deploy no-regret strategies, there exist dynamic benchmark consistent strategies such that if all players deploy these strategies the same empirical distribution emerges when the horizon is large enough. This result demonstrates that although dynamic benchmark consistent strategies have a different algorithmic structure and provide significantly enhanced individual assurances, they lead to the same equilibrium set as no-regret strategies. Moreover, the proof of our main result uncovers the capacity of independent algorithms with strong individual guarantees to foster a strong form of coordination.
翻译:在重复博弈中,策略通常根据其保证事后选择的单一最优行动性能的能力来评估,这一性质被称为汉南一致性或无遗憾。然而,单一最优行动作为评估策略的标尺效果有限,因为在常见的动态环境中,任何静态行动的表现可能较差。因此,我们的工作转向一种更宏大的概念——动态基准一致性,它保证在行动变化次数受约束的条件下,事后选择的最优动态行动序列的性能。我们的主要结果表明,对于所有玩家采用无遗憾策略时可能出现的任何联合经验分布,存在动态基准一致策略,使得当所有玩家采用这些策略时,在时间跨度足够大的情况下会出现相同的经验分布。这一结果证明,尽管动态基准一致策略具有不同的算法结构且提供显著增强的个体保障,但它们与无遗憾策略导致相同的均衡集合。此外,我们主要结果的证明揭示了具有强个体保障的独立算法促进强形式协调的能力。