This paper presents a payoff perturbation technique, introducing a strong convexity to players' payoff functions in games. This technique is specifically designed for first-order methods to achieve last-iterate convergence in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. Although perturbation is known to facilitate the convergence of learning algorithms, the magnitude of perturbation requires careful adjustment to ensure last-iterate convergence. Previous studies have proposed a scheme in which the magnitude is determined by the distance from a periodically re-initialized anchoring or reference strategy. Building upon this, we propose Gradient Ascent with Boosting Payoff Perturbation, which incorporates a novel perturbation into the underlying payoff function, maintaining the periodically re-initializing anchoring strategy scheme. This innovation empowers us to provide faster last-iterate convergence rates against the existing payoff perturbed algorithms, even in the presence of additive noise.
翻译:本文提出了一种收益扰动技术,通过在博弈参与者的收益函数中引入强凸性来实现。该技术专门针对一阶方法设计,旨在实现梯度单调(在策略空间上可能包含加性噪声)的博弈中末次迭代的收敛。尽管已知扰动能够促进学习算法的收敛,但扰动幅度需经精细调节以确保末次迭代收敛。先前研究提出了一种基于周期性重新初始化的锚定(或参考)策略距离来确定扰动幅度的方案。在此基础上,我们提出了带增强收益扰动的梯度上升法,该方法在基础收益函数中引入了一种新型扰动,同时保留了周期性重新初始化的锚定策略框架。这一创新使得我们即使在存在加性噪声的情况下,仍能针对现有收益扰动算法提供更快的末次迭代收敛速率。