This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings. In the former, players observe their exact gradient vectors of the utility functions. In the latter, they only observe the noisy gradient vectors. Even the celebrated Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms may not converge to a Nash equilibrium with noisy feedback. On the contrary, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both feedback settings. We then prove that it converges to an exact Nash equilibrium by iteratively adapting the mutation term. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.
翻译:本文提出突变驱动乘法权重更新(Mutation-Driven Multiplicative Weights Update, M2WU)算法,用于学习两人零和标准型博弈中的均衡,并证明该算法在完全反馈和噪声反馈两种设置下均具有最后一步收敛性质。在前者中,博弈方观测到其效用函数的精确梯度向量;在后者中,博弈方仅能观测到带噪声的梯度向量。即使是著名的乘法权重更新(MWU)和乐观乘法权重更新(OMWU)算法,在噪声反馈下也可能无法收敛至纳什均衡。相比之下,M2WU算法在两种反馈设置下均能最后一步收敛至纳什均衡附近的驻点。我们进一步证明,通过迭代调整突变项,该算法可收敛至精确纳什均衡。实验验证表明,在可剥削性与收敛速度方面,M2WU均优于MWU和OMWU算法。