Federated learning (FL) has become a popular tool for solving traditional Reinforcement Learning (RL) tasks. The multi-agent structure addresses the major concern of data-hungry in traditional RL, while the federated mechanism protects the data privacy of individual agents. However, the federated mechanism also exposes the system to poisoning by malicious agents that can mislead the trained policy. Despite the advantage brought by FL, the vulnerability of Federated Reinforcement Learning (FRL) has not been well-studied before. In this work, we propose a general framework to characterize FRL poisoning as an optimization problem and design a poisoning protocol that can be applied to policy-based FRL. Our framework can also be extended to FRL with actor-critic as a local RL algorithm by training a pair of private and public critics. We provably show that our method can strictly hurt the global objective. We verify our poisoning effectiveness by conducting extensive experiments targeting mainstream RL algorithms and over various RL OpenAI Gym environments covering a wide range of difficulty levels. Within these experiments, we compare clean and baseline poisoning methods against our proposed framework. The results show that the proposed framework is successful in poisoning FRL systems and reducing performance across various environments and does so more effectively than baseline methods. Our work provides new insights into the vulnerability of FL in RL training and poses new challenges for designing robust FRL algorithms
翻译:联邦学习(FL)已成为解决传统强化学习(RL)任务的流行工具。其多智能体结构解决了传统RL中数据需求大的主要问题,而联邦机制则保护了各智能体的数据隐私。然而,联邦机制也使系统面临恶意智能体发起投毒攻击的风险,这些攻击可能误导训练出的策略。尽管FL带来了诸多优势,但联邦强化学习(FRL)的脆弱性此前尚未得到充分研究。本文提出一个通用框架,将FRL投毒攻击形式化为一个优化问题,并设计了一种可应用于基于策略的FRL的投毒协议。我们的框架还可扩展到以Actor-Critic作为本地RL算法的FRL中,通过训练一对私有和公共评论家实现。我们可证明表明,该方法能严格损害全局目标。为验证投毒效果,我们针对主流RL算法进行了大量实验,覆盖了多种难度级别的OpenAI Gym RL环境。在这些实验中,我们将干净环境与基线投毒方法进行了对比。结果表明,所提框架能成功投毒FRL系统,并在各环境中显著降低性能,且效果优于基线方法。我们的工作为RL训练中FL的脆弱性提供了新见解,并为设计稳健的FRL算法提出了新挑战。