Federated learning (FL) has become a popular tool for solving traditional Reinforcement Learning (RL) tasks. The multi-agent structure addresses the major concern of data-hungry in traditional RL, while the federated mechanism protects the data privacy of individual agents. However, the federated mechanism also exposes the system to poisoning by malicious agents that can mislead the trained policy. Despite the advantage brought by FL, the vulnerability of Federated Reinforcement Learning (FRL) has not been well-studied before. In this work, we propose the first general framework to characterize FRL poisoning as an optimization problem constrained by a limited budget and design a poisoning protocol that can be applied to policy-based FRL and extended to FRL with actor-critic as a local RL algorithm by training a pair of private and public critics. We also discuss a conventional defense strategy inherited from FL to mitigate this risk. We verify our poisoning effectiveness by conducting extensive experiments targeting mainstream RL algorithms and over various RL OpenAI Gym environments covering a wide range of difficulty levels. Our results show that our proposed defense protocol is successful in most cases but is not robust under complicated environments. Our work provides new insights into the vulnerability of FL in RL training and poses additional challenges for designing robust FRL algorithms.
翻译:联邦学习(FL)已成为解决传统强化学习(RL)任务的流行工具。其多智能体结构解决了传统强化学习中数据需求大的主要问题,而联邦机制则保护了各智能体的数据隐私。然而,联邦机制也使得系统易受恶意智能体的投毒攻击,这些攻击可能误导训练策略。尽管FL带来了优势,但联邦强化学习(FRL)的脆弱性此前尚未得到充分研究。本文首次提出一个通用框架,将FRL投毒攻击表征为一个有限预算约束下的优化问题,并设计了一种适用于策略基FRL的投毒协议,该协议可通过训练一对私有关注者和公共关注者扩展至使用演员-评论家作为本地RL算法的FRL。我们还讨论了一种从FL继承的常规防御策略以缓解此风险。通过针对主流RL算法及涵盖多种难度级别的OpenAI Gym环境进行大量实验,我们验证了投毒攻击的有效性。结果表明,所提出的防御协议在大多数情况下有效,但在复杂环境下鲁棒性不足。本研究为RL训练中FL的脆弱性提供了新见解,并对设计鲁棒FRL算法提出了额外挑战。