We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that attains nearly the same guarantees as its non-adapted counterpart, while operating without knowledge of either the game or of the noise profile.
翻译:我们研究了当学习者参与与其他优化主体的连续博弈时的遗憾最小化问题:在此情境下,若所有玩家均采用无遗憾算法,相较于完全对抗环境,可获得显著更低的遗憾值。我们针对变分稳定博弈(包含所有凸-凹博弈与单调博弈的一类连续博弈),且玩家仅能获取其个体收益梯度的噪声估计时进行该问题的研究。若噪声为加性噪声,博弈论设置与纯对抗设置享有相近的遗憾保证;然而,当噪声为乘性噪声时,我们证明学习者实际上可实现常值遗憾。我们通过采用学习率分离的乐观梯度方案达成这一更快速率——即该方法的外推步长与更新步长根据噪声特性按不同调度进行调节。进而,为解决精细超参数调优需求,我们提出一种全自适应方法,该方法无需预知博弈类型或噪声特性,即可实现与其非自适应版本几乎相同的性能保证。