Centralized Training with Decentralized Execution (CTDE) has emerged as a widely adopted paradigm in multi-agent reinforcement learning, emphasizing the utilization of global information for learning an enhanced joint $Q$-function or centralized critic. In contrast, our investigation delves into harnessing global information to directly enhance individual $Q$-functions or individual actors. Notably, we discover that applying identical global information universally across all agents proves insufficient for optimal performance. Consequently, we advocate for the customization of global information tailored to each agent, creating agent-personalized global information to bolster overall performance. Furthermore, we introduce a novel paradigm named Personalized Training with Distilled Execution (PTDE), wherein agent-personalized global information is distilled into the agent's local information. This distilled information is then utilized during decentralized execution, resulting in minimal performance degradation. PTDE can be seamlessly integrated with state-of-the-art algorithms, leading to notable performance enhancements across diverse benchmarks, including the SMAC benchmark, Google Research Football (GRF) benchmark, and Learning to Rank (LTR) task.
翻译:摘要:集中训练与分散执行(CTDE)已成为多智能体强化学习中广泛采用的范式,强调利用全局信息学习增强的联合$Q$函数或集中式评论员。相比之下,我们的研究深入探索了如何利用全局信息直接提升个体$Q$函数或个体行动者。值得注意的是,我们发现对所有智能体统一应用相同的全局信息不足以实现最优性能。因此,我们主张为每个智能体定制个性化的全局信息,创建智能体专属的全局信息以增强整体性能。此外,我们提出了一种名为“个性化训练与精炼执行”(PTDE)的新范式,其中智能体个性化的全局信息被精炼到智能体的局部信息中,并在分散执行阶段使用该精炼信息,从而实现最小化的性能下降。PTDE可无缝集成到最先进的算法中,在多种基准测试(包括SMAC基准、Google Research Football (GRF) 基准以及学习排序 (LTR) 任务)中均能显著提升性能。