Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of \emph{optimistic gradient descent (OGD)} in time-varying games by drawing a strong connection with \emph{dynamic regret}. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games parameterized on the \emph{minimal} first-order variation of the Nash equilibria and the second-order variation of the payoff matrices, subsuming known results for static games. Furthermore, we establish improved \emph{second-order} variation bounds under strong convexity-concavity, as long as each game is repeated multiple times. Our results also apply to time-varying \emph{general-sum} multi-player games via a bilinear formulation of correlated equilibria, which has novel implications for meta-learning and for obtaining refined variation-dependent regret bounds, addressing questions left open in prior papers. Finally, we leverage our framework to also provide new insights on dynamic regret guarantees in static games.
翻译:大多数关于博弈中学习的文献都局限于基础重复博弈不随时间变化的限制性设定。关于动态多智能体环境中无遗憾学习算法的收敛性,我们知之甚少。本文通过建立与动态遗憾的强关联,刻画了时变博弈中乐观梯度下降法(OGD)的收敛性。我们的框架以纳什均衡的一阶最小变分和收益矩阵的二阶变分为参数,为零和博弈中OGD的均衡差距提供了尖锐的收敛界限,涵盖静态博弈的已知结果。此外,我们建立了在强凸-强凹条件下的改进二阶变分界限,前提是每个博弈重复多次。我们的结果还通过相关均衡的双线性公式适用于时变的一般和多人博弈,这对元学习及获得精细的变分依赖遗憾界限具有新启示,解决了先前文献中未解决的问题。最后,我们利用该框架为静态博弈中的动态遗憾保证提供了新见解。