SOTA multiagent reinforcement algorithms distinguish themselves in many ways from their single-agent equivalences. However, most of them still totally inherit the single-agent exploration-exploitation strategy. Naively inheriting this strategy from single-agent algorithms causes potential collaboration failures, in which the agents blindly follow mainstream behaviors and reject taking minority responsibility. We name this problem the Responsibility Diffusion (RD) as it shares similarities with a same-name social psychology effect. In this work, we start by theoretically analyzing the cause of this RD problem, which can be traced back to the exploration-exploitation dilemma of multiagent systems (especially large-scale multiagent systems). We address this RD problem by proposing a Policy Resonance (PR) approach which modifies the collaborative exploration strategy of agents by refactoring the joint agent policy while keeping individual policies approximately invariant. Next, we show that SOTA algorithms can equip this approach to promote the collaborative performance of agents in complex cooperative tasks. Experiments are performed in multiple test benchmark tasks to illustrate the effectiveness of this approach.
翻译:当前最先进的多智能体强化学习算法在许多方面区别于单智能体等价算法。然而,大多数算法仍完全继承单智能体的探索-利用策略。直接继承单智能体算法的这种策略会导致潜在的协作失败,即智能体盲目遵循主流行为而拒绝承担少数责任。我们将此问题命名为责任扩散(RD),因其与同名的社会心理学效应存在相似性。本文首先从理论上分析RD问题的成因,该问题可追溯至多智能体系统(尤其是大规模多智能体系统)的探索-利用困境。我们通过提出策略共振(PR)方法来解决该问题,该方法在保持个体策略近似不变的前提下重构联合智能体策略,从而修改智能体的协作探索策略。随后,我们证明可令最先进算法配备该方法,以提升智能体在复杂协作任务中的协作性能。在多个测试基准任务上进行的实验验证了该方法的有效性。