Value decomposition methods have gradually become popular in the cooperative multi-agent reinforcement learning field. However, almost all value decomposition methods follow the Individual Global Max (IGM) principle or its variants, which restricts the range of issues that value decomposition methods can resolve. Inspired by the notion of dual self-awareness in psychology, we propose a dual self-awareness value decomposition framework that entirely rejects the IGM premise. Each agent consists of an ego policy that carries out actions and an alter ego value function that takes part in credit assignment. The value function factorization can ignore the IGM assumption by using an explicit search procedure. We also suggest a novel anti-ego exploration mechanism to avoid the algorithm becoming stuck in a local optimum. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.
翻译:价值分解方法在合作多智能体强化学习领域逐渐成为主流。然而,几乎所有价值分解方法都遵循个体全局最大(IGM)原则或其变体,这限制了价值分解方法所能解决问题的范围。受心理学中双重自我意识概念的启发,我们提出了一种完全摒弃IGM前提的双重自我意识价值分解框架。每个智能体包含执行动作的自我策略和参与信用分配的异我价值函数。通过引入显式搜索过程,该价值函数分解可忽略IGM假设。我们还提出了一种新颖的反自我探索机制,以避免算法陷入局部最优。作为首个完全无IGM的价值分解方法,本框架在多种合作任务中取得了理想性能。