Federated learning (FL) has become a popular tool for solving traditional Reinforcement Learning (RL) tasks. The multi-agent structure addresses the major concern of data-hungry in traditional RL, while the federated mechanism protects the data privacy of individual agents. However, the federated mechanism also exposes the system to poisoning by malicious agents that can mislead the trained policy. Despite the advantage brought by FL, the vulnerability of Federated Reinforcement Learning (FRL) has not been well-studied before. In this work, we propose the first general framework to characterize FRL poisoning as an optimization problem constrained by a limited budget and design a poisoning protocol that can be applied to policy-based FRL and extended to FRL with actor-critic as a local RL algorithm by training a pair of private and public critics. We also discuss a conventional defense strategy inherited from FL to mitigate this risk. We verify our poisoning effectiveness by conducting extensive experiments targeting mainstream RL algorithms and over various RL OpenAI Gym environments covering a wide range of difficulty levels. Our results show that our proposed defense protocol is successful in most cases but is not robust under complicated environments. Our work provides new insights into the vulnerability of FL in RL training and poses additional challenges for designing robust FRL algorithms.
翻译:联邦学习已成为解决传统强化学习任务的热门工具。其多智能体结构解决了传统强化学习中数据需求大的主要问题,而联邦机制则保护了单个智能体的数据隐私。然而,联邦机制也使得系统易受恶意智能体的投毒攻击,这些攻击可能误导训练策略。尽管联邦学习带来了优势,但联邦强化学习的脆弱性此前尚未得到充分研究。本文首次提出通用框架,将联邦强化学习投毒问题建模为受有限预算约束的优化问题,并设计了一种可应用于基于策略的联邦强化学习、以及通过训练私有与公共评论家对扩展至采用演员-评论家作为局部强化学习算法的投毒协议。我们还讨论了从联邦学习继承的传统防御策略以缓解该风险。通过对主流强化学习算法及覆盖不同难度等级的OpenAI Gym环境开展大量实验,验证了投毒有效性。结果表明,我们提出的防御协议在大多数情况下有效,但在复杂环境下鲁棒性不足。本研究为强化学习训练中联邦学习的脆弱性提供了新见解,并为设计鲁棒的联邦强化学习算法提出了新的挑战。