Centralized Training with Decentralized Execution (CTDE) has recently emerged as a popular framework for cooperative Multi-Agent Reinforcement Learning (MARL), where agents can use additional global state information to guide training in a centralized way and make their own decisions only based on decentralized local policies. Despite the encouraging results achieved, CTDE makes an independence assumption on agent policies, which limits agents to adopt global cooperative information from each other during centralized training. Therefore, we argue that existing CTDE methods cannot fully utilize global information for training, leading to an inefficient joint-policy exploration and even suboptimal results. In this paper, we introduce a novel Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning, that not only enables an efficacious message exchange among agents during training but also guarantees the independent policies for execution. Firstly, CADP endows agents the explicit communication channel to seek and take advices from different agents for more centralized training. To further ensure the decentralized execution, we propose a smooth model pruning mechanism to progressively constraint the agent communication into a closed one without degradation in agent cooperation capability. Empirical evaluations on StarCraft II micromanagement and Google Research Football benchmarks demonstrate that the proposed framework achieves superior performance compared with the state-of-the-art counterparts. Our code will be made publicly available.
翻译:集中训练与分散执行(CTDE)最近成为合作式多智能体强化学习(MARL)的流行框架,其中智能体可以以集中方式利用额外的全局状态信息指导训练,并仅基于分散的局部策略做出自己的决策。尽管取得了令人鼓舞的结果,但CTDE对智能体策略做出了独立性假设,这限制了智能体在集中训练期间相互采用全局合作信息。因此,我们认为现有的CTDE方法无法充分利用全局信息进行训练,导致联合策略探索效率低下,甚至产生次优结果。在本文中,我们提出了一种新颖的多智能体强化学习框架——集中建议与分散剪枝(CADP),它不仅在训练过程中实现了智能体之间的高效消息交换,而且保证了执行的独立策略。首先,CADP赋予智能体显式的通信渠道,以便向不同智能体寻求并采纳建议,实现更集中的训练。为了进一步确保分散执行,我们提出了一种平滑模型剪枝机制,逐步将智能体通信约束为封闭形式,同时不降低智能体协作能力。在星际争霸II微操和谷歌研究足球基准上的实验评估表明,所提出的框架与最先进的对比方法相比取得了卓越性能。我们的代码将公开提供。