Developing effective Multi-Agent Systems (MAS) is critical for many applications requiring collaboration and coordination with humans. Despite the rapid advance of Multi-Agent Deep Reinforcement Learning (MADRL) in cooperative MAS, one major challenge is the simultaneous learning and interaction of independent agents in dynamic environments in the presence of stochastic rewards. State-of-the-art MADRL models struggle to perform well in Coordinated Multi-agent Object Transportation Problems (CMOTPs), wherein agents must coordinate with each other and learn from stochastic rewards. In contrast, humans often learn rapidly to adapt to nonstationary environments that require coordination among people. In this paper, motivated by the demonstrated ability of cognitive models based on Instance-Based Learning Theory (IBLT) to capture human decisions in many dynamic decision making tasks, we propose three variants of Multi-Agent IBL models (MAIBL). The idea of these MAIBL algorithms is to combine the cognitive mechanisms of IBLT and the techniques of MADRL models to deal with coordination MAS in stochastic environments from the perspective of independent learners. We demonstrate that the MAIBL models exhibit faster learning and achieve better coordination in a dynamic CMOTP task with various settings of stochastic rewards compared to current MADRL models. We discuss the benefits of integrating cognitive insights into MADRL models.
翻译:开发有效的多智能体系统对于许多需要与人类协作和协调的应用至关重要。尽管多智能体深度强化学习在协作型多智能体系统中取得了快速进展,但一个主要挑战是独立智能体在动态环境中面对随机奖励时的同步学习与交互问题。最先进的多智能体深度强化学习模型在协调型多智能体物体运输问题中表现不佳,该问题要求智能体之间相互协调并从随机奖励中学习。相比之下,人类通常能快速适应需要人际协调的非平稳环境。受基于实例学习理论的认知模型在诸多动态决策任务中成功捕捉人类决策的启发,本文提出三种多智能体实例学习模型变体。这些多智能体实例学习算法的核心思想是将实例学习理论的认知机制与多智能体深度强化学习技术相结合,从独立学习者的角度处理随机环境中的协调型多智能体系统。实验证明,在具有不同随机奖励设置的动态协调型多智能体物体运输任务中,多智能体实例学习模型相比当前多智能体深度强化学习模型展现出更快的学习速度和更优的协调能力。本文讨论了将认知洞见融入多智能体深度强化学习模型的价值。