Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption. Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents' tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. As our experiments show, these guarantees are independent of whether the dynamics ultimately reach an equilibrium, or remain non-convergent.

翻译：在竞争性环境中，多智能体学习的行为通常是在零和博弈这一严格假设下进行研究的。仅有在此严格要求下，学习的行为才能被充分理解；超出此范围，学习动态常常表现出非收敛行为，从而阻碍了不动点分析。然而，许多相关的竞争性博弈并不满足零和假设。受此启发，我们研究了一种平滑变体的Q-学习，这是一种流行的强化学习动态，它平衡了智能体最大化其收益的倾向与其探索状态空间的倾向。我们在“接近”网络零和博弈的博弈中检验了这一动态，并发现Q-学习收敛到一个唯一均衡点附近的邻域内。该邻域的大小由与零和博弈的“距离”以及智能体的探索率共同决定。为了补充这些结果，我们提供了一种方法：对于任意给定的网络博弈，能够高效地找到其“最近”的网络零和博弈。正如我们的实验所示，这些保证独立于动态最终是达到均衡还是保持非收敛状态。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日