Nash Q-learning may be considered one of the first and most known algorithms in multi-agent reinforcement learning (MARL) for learning policies that constitute a Nash equilibrium of an underlying general-sum Markov game. Its original proof provided asymptotic guarantees and was for the tabular case. Recently, finite-sample guarantees have been provided using more modern RL techniques for the tabular case. Our work analyzes Nash Q-learning using linear function approximation -- a representation regime introduced when the state space is large or continuous -- and provides finite-sample guarantees that indicate its sample efficiency. We find that the obtained performance nearly matches an existing efficient result for single-agent RL under the same representation and has a polynomial gap when compared to the best-known result for the tabular case.
翻译:纳什Q学习可被视为多智能体强化学习(MARL)中最早且最著名的算法之一,用于学习构成底层一般和马尔可夫博弈纳什均衡的策略。其原始证明提供了渐近保证,且仅限于表格情形。近期,利用更先进的强化学习技术,针对表格情形给出了有限样本保证。本文基于线性函数近似(一种在状态空间较大或连续时使用的表示方法)对纳什Q学习进行分析,并给出表明其样本效率的有限样本保证。研究发现,在相同表示方法下,所得性能几乎与单智能体强化学习中已有的高效结果持平,但与表格情形下已知最优结果相比存在多项式差距。