A recent paper by Farina & Pipis (2023) established the existence of uncoupled no-linear-swap regret dynamics with polynomial-time iterations in extensive-form games. The equilibrium points reached by these dynamics, known as linear correlated equilibria, are currently the tightest known relaxation of correlated equilibrium that can be learned in polynomial time in any finite extensive-form game. However, their properties remain vastly unexplored, and their computation is onerous. In this paper, we provide several contributions shedding light on the fundamental nature of linear-swap regret. First, we show a connection between linear deviations and a generalization of communication deviations in which the player can make queries to a "mediator" who replies with action recommendations, and, critically, the player is not constrained to match the timing of the game as would be the case for communication deviations. We coin this latter set the untimed communication (UTC) deviations. We show that the UTC deviations coincide precisely with the linear deviations, and therefore that any player minimizing UTC regret also minimizes linear-swap regret. We then leverage this connection to develop state-of-the-art no-regret algorithms for computing linear correlated equilibria, both in theory and in practice. In theory, our algorithms achieve polynomially better per-iteration runtimes; in practice, our algorithms represent the state of the art by several orders of magnitude.
翻译:Farina & Pipis (2023) 最近的一篇论文建立了扩展形式博弈中具有多项式时间迭代的非耦合无线性交换遗憾动态。这些动态达到的均衡点称为线性相关均衡,是目前在任何有限扩展形式博弈中可通过多项式时间学习的最紧密的相关均衡松弛形式。然而,其性质仍鲜有探索,且计算代价高昂。本文从多个角度阐明了线性交换遗憾的基本性质。首先,我们揭示了线性偏差与一种泛化通信偏差之间的联系:在该偏差中,玩家可以向提供行动建议的"中介"进行查询,且关键的是,玩家无需像通信偏差那样匹配博弈的时序。我们将后者称为非定时通信偏差。我们证明非定时通信偏差与线性偏差完全一致,因此任何最小化非定时通信遗憾的玩家也同时最小化线性交换遗憾。随后,我们利用这一联系开发了计算线性相关均衡的最先进无遗憾算法,覆盖理论与实践两方面。理论上,我们的算法实现了每轮迭代运行时间的多项式级别改善;实践中,我们的算法将计算效率提升数个数量级,达到当前最优水平。