Federated learning (FL) has become a popular tool for solving traditional Reinforcement Learning (RL) tasks. The multi-agent structure addresses the major concern of data-hungry in traditional RL, while the federated mechanism protects the data privacy of individual agents. However, the federated mechanism also exposes the system to poisoning by malicious agents that can mislead the trained policy. Despite the advantage brought by FL, the vulnerability of Federated Reinforcement Learning (FRL) has not been well-studied before. In this work, we propose the first general framework to characterize FRL poisoning as an optimization problem constrained by a limited budget and design a poisoning protocol that can be applied to policy-based FRL and extended to FRL with actor-critic as a local RL algorithm by training a pair of private and public critics. We also discuss a conventional defense strategy inherited from FL to mitigate this risk. We verify our poisoning effectiveness by conducting extensive experiments targeting mainstream RL algorithms and over various RL OpenAI Gym environments covering a wide range of difficulty levels. Our results show that our proposed defense protocol is successful in most cases but is not robust under complicated environments. Our work provides new insights into the vulnerability of FL in RL training and poses additional challenges for designing robust FRL algorithms.
翻译:联邦学习(FL)已成为解决传统强化学习(RL)任务的流行工具。其多智能体结构解决了传统RL中数据需求过大的主要问题,而联邦机制则保护了个体智能体的数据隐私。然而,联邦机制也使系统面临恶意智能体的投毒风险,这些攻击可能误导训练策略。尽管联邦学习带来了诸多优势,但联邦强化学习(FRL)的脆弱性此前尚未得到充分研究。本文首次提出一个通用框架,将FRL投毒问题描述为受有限预算约束的优化问题,并设计了一种投毒协议,该协议可应用于基于策略的FRL,并通过训练一对私有和公共评论家,扩展至将演员-评论家作为局部RL算法的FRL。我们还讨论了从FL继承的传统防御策略以缓解此风险。通过针对主流RL算法及涵盖不同难度的多种OpenAI Gym环境进行大量实验,我们验证了投毒策略的有效性。结果表明,所提出的防御协议在大多数情况下有效,但在复杂环境下鲁棒性不足。本研究为RL训练中FL的脆弱性提供了新见解,并为设计鲁棒的FRL算法提出了额外挑战。