This paper proposes a payoff perturbation technique for the Mirror Descent (MD) algorithm in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. The optimistic family of learning algorithms, exemplified by optimistic MD, successfully achieves {\it last-iterate} convergence in scenarios devoid of noise, leading the dynamics to a Nash equilibrium. A recent re-emerging trend underscores the promise of the perturbation approach, where payoff functions are perturbed based on the distance from an anchoring, or {\it slingshot}, strategy. In response, we propose {\it Adaptively Perturbed MD} (APMD), which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval. This innovation empowers us to find a Nash equilibrium of the underlying game with guaranteed rates. Empirical demonstrations affirm that our algorithm exhibits significantly accelerated convergence.
翻译:本文针对梯度在策略空间上单调且可能包含加性噪声的博弈问题,提出了一种用于镜像下降算法的收益扰动技术。以乐观镜像下降为代表的乐观学习算法族在无噪声场景下成功实现了末点收敛,使动态过程趋于纳什均衡。近期重新兴起的扰动方法趋势展现出巨大潜力,该方法基于当前策略与锚定策略之间的距离对收益函数进行扰动。为此,我们提出自适应扰动镜像下降算法,该算法通过以预设间隔重复更新锚定策略来自适应调整扰动幅度。这一创新使我们能够以可保证的收敛速率找到底层博弈的纳什均衡。实证结果表明,该算法展现出显著加速的收敛性能。