Polynomial-Time Linear-Swap Regret Minimization in Imperfect-Information Sequential Games

No-regret learners seek to minimize the difference between the loss they cumulated through the actions they played, and the loss they would have cumulated in hindsight had they consistently modified their behavior according to some strategy transformation function. The size of the set of transformations considered by the learner determines a natural notion of rationality. As the set of transformations each learner considers grows, the strategies played by the learners recover more complex game-theoretic equilibria, including correlated equilibria in normal-form games and extensive-form correlated equilibria in extensive-form games. At the extreme, a no-swap-regret agent is one that minimizes regret against the set of all functions from the set of strategies to itself. While it is known that the no-swap-regret condition can be attained efficiently in nonsequential (normal-form) games, understanding what is the strongest notion of rationality that can be attained efficiently in the worst case in sequential (extensive-form) games is a longstanding open problem. In this paper we provide a positive result, by showing that it is possible, in any sequential game, to retain polynomial-time (in the game tree size) iterations while achieving sublinear regret with respect to all linear transformations of the mixed strategy space, a notion called no-linear-swap regret. This notion of hindsight rationality is as strong as no-swap-regret in nonsequential games, and stronger than no-trigger-regret in sequential games -- thereby proving the existence of a subset of extensive-form correlated equilibria robust to linear deviations, which we call linear-deviation correlated equilibria, that can be approached efficiently.

翻译：无遗憾学习者的目标是最小化其实际采取行动所累积的损失，与若其根据某种策略变换函数持续调整行为时事后本应累积的损失之间的差异。学习者考虑的变换函数集合的规模决定了理性行为的自然概念。随着每个学习者考虑的变换函数集合扩大，其采取的策略能恢复更复杂的博弈论均衡，包括正规型博弈中的相关均衡和扩展型博弈中的扩展型相关均衡。极端情况下，无交换遗憾智能体是将策略集到自身的所有函数对应的遗憾最小化的学习者。尽管已知在非序贯（正规型）博弈中可高效实现无交换遗憾条件，但在序贯（扩展型）博弈中最坏情况下可高效实现的最强理性概念究竟是什么，仍是一个长期悬而未决的问题。本文通过证明在任何序贯博弈中，可以在博弈树规模的多项式时间内保持迭代次数的同时，实现相对于混合策略空间所有线性变换的次线性遗憾（称为无线性交换遗憾），提供了正向结果。这种事后理性概念在非序贯博弈中与无交换遗憾同样强大，且在序贯博弈中强于无触发遗憾——由此证明了存在一个对线性偏离具有鲁棒性的扩展型相关均衡子集（我们称为线性偏离相关均衡），且该子集可被高效逼近。