Over-generalization is a thorny issue in cognitive science, where people may become overly cautious due to past experiences. Agents in multi-agent reinforcement learning (MARL) also have been found to suffer relative over-generalization (RO) as people do and stuck to sub-optimal cooperation. Recent methods have shown that assigning reasoning ability to agents can mitigate RO algorithmically and empirically, but there has been a lack of theoretical understanding of RO, let alone designing provably RO-free methods. This paper first proves that RO can be avoided when the MARL method satisfies a consistent reasoning requirement under certain conditions. Then we introduce a novel reasoning framework, called negotiated reasoning, that first builds the connection between reasoning and RO with theoretical justifications. After that, we propose an instantiated algorithm, Stein variational negotiated reasoning (SVNR), which uses Stein variational gradient descent to derive a negotiation policy that provably avoids RO in MARL under maximum entropy policy iteration. The method is further parameterized with neural networks for amortized learning, making computation efficient. Numerical experiments on many RO-challenged environments demonstrate the superiority and efficiency of SVNR compared to state-of-the-art methods in addressing RO.
翻译:过泛化是认知科学中的一个棘手问题,个体可能因过往经验而变得过度谨慎。研究发现,多智能体强化学习(MARL)中的智能体也像人类一样会遭受相对过泛化(RO)的困扰,并陷入次优协作。近期方法表明,赋予智能体推理能力可以从算法和实证层面缓解RO,但学界对RO仍缺乏理论理解,更遑论设计可证明无RO的方法。本文首先证明,当MARL方法在特定条件下满足一致性推理要求时,可以避免RO。随后我们引入一种名为协商推理的新型推理框架,首次从理论层面建立了推理与RO之间的关联。据此,我们提出实例化算法——斯坦因变分协商推理(SVNR),该方法利用斯坦因变分梯度下降推导协商策略,可在最大熵策略迭代下可证明地避免MARL中的RO。该算法进一步通过神经网络参数化实现摊销学习,提升计算效率。在多个RO挑战场景中的数值实验表明,相较于现有最优方法,SVNR在解决RO问题中展现出优越性与高效性。