Agent faults pose a significant threat to the performance of multi-agent reinforcement learning (MARL) algorithms, introducing two key challenges. First, agents often struggle to extract critical information from the chaotic state space created by unexpected faults. Second, transitions recorded before and after faults in the replay buffer affect training unevenly, leading to a sample imbalance problem. To overcome these challenges, this paper enhances the fault tolerance of MARL by combining optimized model architecture with a tailored training data sampling strategy. Specifically, an attention mechanism is incorporated into the actor and critic networks to automatically detect faults and dynamically regulate the attention given to faulty agents. Additionally, a prioritization mechanism is introduced to selectively sample transitions critical to current training needs. To further support research in this area, we design and open-source a highly decoupled code platform for fault-tolerant MARL, aimed at improving the efficiency of studying related problems. Experimental results demonstrate the effectiveness of our method in handling various types of faults, faults occurring in any agent, and faults arising at random times.
翻译:智能体故障对多智能体强化学习算法的性能构成显著威胁,主要带来两大挑战。首先,智能体往往难以从意外故障造成的混沌状态空间中提取关键信息。其次,存储在经验回放池中故障前后记录的转移样本对训练产生不均衡影响,导致样本失衡问题。为应对这些挑战,本文通过结合优化的模型架构与定制化的训练数据采样策略,提升多智能体强化学习的容错能力。具体而言,我们在执行者与评判者网络中引入注意力机制,以自动检测故障并动态调节对故障智能体的关注度。此外,本文设计了优先级采样机制,能够根据当前训练需求选择性抽取关键转移样本。为推进该领域研究,我们设计并开源了一个高度解耦的容错多智能体强化学习代码平台,旨在提升相关问题的研究效率。实验结果表明,我们的方法能有效处理各类故障模式、任意智能体发生的故障以及随机时间点出现的故障。