In multi-agent reinforcement learning (MARL), the Centralized Training with Decentralized Execution (CTDE) framework is pivotal but struggles due to a gap: global state guidance in training versus reliance on local observations in execution, lacking global signals. Inspired by human societal consensus mechanisms, we introduce the Hierarchical Consensus-based Multi-Agent Reinforcement Learning (HC-MARL) framework to address this limitation. HC-MARL employs contrastive learning to foster a global consensus among agents, enabling cooperative behavior without direct communication. This approach enables agents to form a global consensus from local observations, using it as an additional piece of information to guide collaborative actions during execution. To cater to the dynamic requirements of various tasks, consensus is divided into multiple layers, encompassing both short-term and long-term considerations. Short-term observations prompt the creation of an immediate, low-layer consensus, while long-term observations contribute to the formation of a strategic, high-layer consensus. This process is further refined through an adaptive attention mechanism that dynamically adjusts the influence of each consensus layer. This mechanism optimizes the balance between immediate reactions and strategic planning, tailoring it to the specific demands of the task at hand. Extensive experiments and real-world applications in multi-robot systems showcase our framework's superior performance, marking significant advancements over baselines.
翻译:在多智能体强化学习(MARL)中,集中训练分散执行(CTDE)框架至关重要,但由于训练中的全局状态指导与执行中依赖局部观测而缺乏全局信号之间存在差距,其应用面临挑战。受人类社会共识机制的启发,我们提出了基于分层共识的多智能体强化学习(HC-MARL)框架以解决这一局限。HC-MARL采用对比学习促进智能体间形成全局共识,从而在无需直接通信的情况下实现协作行为。该方法使智能体能够从局部观测中形成全局共识,并将其作为额外信息来指导执行阶段的协作行动。为适应不同任务的动态需求,共识被划分为多个层级,涵盖短期与长期考量。短期观测促成立即的低层共识,而长期观测则有助于形成战略性的高层共识。此过程通过自适应注意力机制进一步优化,该机制动态调整各共识层的影响力。该机制优化了即时反应与战略规划之间的平衡,使其适应当前任务的具体需求。在多机器人系统中进行的广泛实验与实际应用展示了我们框架的卓越性能,标志着相对于基线方法的显著进步。