Cooperative Multi-Agent Reinforcement Learning (MARL) solves complex tasks that require coordination from multiple agents, but is often limited to either local (independent learning) or global (centralized learning) perspectives. In this paper, we introduce a novel sequential training scheme and MARL architecture, which learns from multiple perspectives on different hierarchy levels. We propose the Hierarchical Lead Critic (HLC) - inspired by natural emerging distributions in team structures, where following high-level objectives combines with low-level execution. HLC demonstrates that introducing multiple hierarchies, leveraging local and global perspectives, can lead to improved performance with high sample efficiency and robust policies. Experimental results conducted on cooperative, non-communicative, and partially observable MARL benchmarks demonstrate that HLC outperforms single hierarchy baselines and scales robustly with increasing amounts of agents and difficulty.
翻译:协作式多智能体强化学习(MARL)旨在解决需要多个智能体协同完成的复杂任务,但现有方法通常局限于局部(独立学习)或全局(集中式学习)的单一视角。本文提出了一种新颖的顺序训练方案与MARL架构,该架构能够从不同层次结构的多个视角进行学习。受团队结构中自然涌现的层级分布启发——即高层目标制定与底层执行相结合,我们提出了分层主导评论家(HLC)方法。HLC表明,通过引入多层次结构并融合局部与全局视角,能够以高样本效率学习到更优策略,并获得稳健的性能提升。在协作式、无通信且部分可观测的MARL基准测试上的实验结果表明,HLC优于单层次基线方法,并且能够随着智能体数量增加和任务难度提升而保持稳健的扩展性。