Zero-sum games are a fundamental setting for adversarial training and decision-making in multi-agent learning (MAL). Existing methods often ensure convergence to (approximate) Nash equilibria by introducing a form of regularization. Yet, regularization requires additional hyperparameters, which must be carefully tuned--a challenging task when the payoff structure is known, and considerably harder when the structure is unknown or subject to change. Motivated by this problem, we repurpose a classical model in evolutionary game theory, i.e., the Brown-von Neumann-Nash (BNN) dynamics, by leveraging the intrinsic convergence of this dynamics in zero-sum games without regularization, and provide last-iterate convergence guarantees in noisy normal-form games (NFGs). Importantly, to make this approach more applicable, we develop a novel framework with theoretical guarantees that integrates the BNN dynamics in extensive-form games (EFGs) through counterfactual weighting. Furthermore, we implement an algorithm that instantiates our framework with neural function approximation, enabling scalable learning in both NFGs and EFGs. Empirical results show that our method quickly adapts to nonstationarities, outperforming the state-of-the-art regularization-based approach.
翻译:零和博弈是多智能体学习中对抗性训练与决策制定的基础场景。现有方法通常通过引入某种形式的正则化来确保收敛至(近似)纳什均衡。然而,正则化需要额外的超参数,这些参数必须精心调整——在收益结构已知时已具挑战性,当结构未知或可能变化时则更为困难。受此问题启发,我们重新利用演化博弈论中的经典模型——Brown-von Neumann-Nash(BNN)动力学,通过挖掘该动力学在零和博弈中无需正则化的内在收敛特性,为噪声正规形式博弈提供了末轮迭代收敛保证。重要的是,为使该方法更具适用性,我们开发了一个具有理论保证的新框架,通过反事实加权将BNN动力学整合至扩展形式博弈中。此外,我们实现了一种采用神经函数逼近的算法来实例化该框架,从而在正规形式博弈和扩展形式博弈中实现可扩展学习。实验结果表明,我们的方法能快速适应非平稳环境,其性能优于当前最先进的基于正则化的方法。