Value decomposition methods have gained popularity in the field of cooperative multi-agent reinforcement learning. However, almost all existing methods follow the principle of Individual Global Max (IGM) or its variants, which limits their problem-solving capabilities. To address this, we propose a dual self-awareness value decomposition framework, inspired by the notion of dual self-awareness in psychology, that entirely rejects the IGM premise. Each agent consists of an ego policy for action selection and an alter ego value function to solve the credit assignment problem. The value function factorization can ignore the IGM assumption by utilizing an explicit search procedure. On the basis of the above, we also suggest a novel anti-ego exploration mechanism to avoid the algorithm becoming stuck in a local optimum. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.
翻译:值分解方法在合作多智能体强化学习领域广受关注。然而,现有方法几乎均遵循个体全局最大化原则或其变体,这限制了其问题求解能力。为突破这一局限,我们受心理学中双重自我意识概念的启发,提出了一种完全摒弃IGM前提的"双重自我意识"值分解框架。每个智能体由用于动作选择的自我策略与用于解决信用分配问题的超我价值函数组成。该值分解方法通过显式搜索过程可完全忽略IGM假设。在此基础上,我们进一步提出新型反自我探索机制,以避免算法陷入局部最优。作为首个完全无IGM依赖的值分解方法,本框架在多种合作任务中均取得了理想性能表现。