Polynomial-Time Linear-Swap Regret Minimization in Imperfect-Information Sequential Games

No-regret learners seek to minimize the difference between the loss they cumulated through the actions they played, and the loss they would have cumulated in hindsight had they consistently modified their behavior according to some strategy transformation function. The size of the set of transformations considered by the learner determines a natural notion of rationality. As the set of transformations each learner considers grows, the strategies played by the learners recover more complex game-theoretic equilibria, including correlated equilibria in normal-form games and extensive-form correlated equilibria in extensive-form games. At the extreme, a no-swap-regret agent is one that minimizes regret against the set of all functions from the set of strategies to itself. While it is known that the no-swap-regret condition can be attained efficiently in nonsequential (normal-form) games, understanding what is the strongest notion of rationality that can be attained efficiently in the worst case in sequential (extensive-form) games is a longstanding open problem. In this paper we provide a positive result, by showing that it is possible, in any sequential game, to retain polynomial-time (in the game tree size) iterations while achieving sublinear regret with respect to all linear transformations of the mixed strategy space, a notion called no-linear-swap regret. This notion of hindsight rationality is as strong as no-swap-regret in nonsequential games, and stronger than no-trigger-regret in sequential games -- thereby proving the existence of a subset of extensive-form correlated equilibria robust to linear deviations, which we call linear-deviation correlated equilibria, that can be approached efficiently.

翻译：无遗憾学习者的目标是最小化其实际采取行动所累积的损失与在事后依据某种策略变换函数持续调整行为后可能累积的损失之间的差异。学习者所考虑的变换函数集合的规模决定了其理性的自然程度。随着每个学习者所考虑的变换集合增大，他们所执行的策略能够恢复更复杂的博弈论均衡，包括正规型博弈中的相关均衡以及扩展型博弈中的扩展型相关均衡。在最极端的情况下，无交换遗憾智能体是针对从策略集合到自身的所有函数最小化遗憾的智能体。尽管已知在非序贯（正规型）博弈中可以高效实现无交换遗憾条件，但在序贯（扩展型）博弈中最坏情况下能高效实现的最强理性概念是什么，这是一个长期存在的开放问题。在本文中，我们给出了一个正面结果，证明在任何序贯博弈中，可以在保留多项式时间（相对于博弈树规模）迭代的同时，实现对混合策略空间所有线性变换的次线性遗憾——这一概念被称为无线性交换遗憾。这种事后理性概念在非序贯博弈中与无交换遗憾同样强大，在序贯博弈中则比无触发遗憾更强——从而证明存在一个扩展型相关均衡的子集（我们称之为线性偏差相关均衡），该子集对线性偏差具有鲁棒性，并且可以高效逼近。