Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games parameterized on natural variation measures of the sequence of games, subsuming known results for static games. Furthermore, we establish improved second-order variation bounds under strong convexity-concavity, as long as each game is repeated multiple times. Our results also apply to time-varying general-sum multi-player games via a bilinear formulation of correlated equilibria, which has novel implications for meta-learning and for obtaining refined variation-dependent regret bounds, addressing questions left open in prior papers. Finally, we leverage our framework to also provide new insights on dynamic regret guarantees in static games.
翻译:关于博弈中学习的大多数文献都局限于重复博弈随时间不变的约束性设定。对于动态多智能体环境中无悔学习算法的收敛性,我们知之甚少。本文刻画了乐观梯度下降(OGD)在时变博弈中的收敛性。我们的框架基于博弈序列的自然变化度量,为零和博弈中OGD的均衡间隙给出了尖锐的收敛界,涵盖了静态博弈的已知结果。此外,当每个博弈被多次重复时,我们在强凸-强凹性下建立了改进的二阶变化界。我们的结果通过相关均衡的双线性表述也适用于时变一般和多人博弈,这对元学习以及获得精细的、依赖于变化的遗憾界具有新颖的意义,解决了先前论文中遗留的问题。最后,我们利用该框架为静态博弈中的动态遗憾保证提供了新的见解。