Federated learning (FL) is a privacy-preserving distributed machine learning paradigm that enables collaborative training among geographically distributed and heterogeneous devices without gathering their data. Extending FL beyond the supervised learning models, federated reinforcement learning (FRL) was proposed to handle sequential decision-making problems in edge computing systems. However, the existing FRL algorithms directly combine model-free RL with FL, thus often leading to high sample complexity and lacking theoretical guarantees. To address the challenges, we propose a novel FRL algorithm that effectively incorporates model-based RL and ensemble knowledge distillation into FL for the first time. Specifically, we utilise FL and knowledge distillation to create an ensemble of dynamics models for clients, and then train the policy by solely using the ensemble model without interacting with the environment. Furthermore, we theoretically prove that the monotonic improvement of the proposed algorithm is guaranteed. The extensive experimental results demonstrate that our algorithm obtains much higher sample efficiency compared to classic model-free FRL algorithms in the challenging continuous control benchmark environments under edge computing settings. The results also highlight the significant impact of heterogeneous client data and local model update steps on the performance of FRL, validating the insights obtained from our theoretical analysis.
翻译:联邦学习是一种保护隐私的分布式机器学习范式,能够在无需收集数据的情况下,实现地理分布式异构设备间的协同训练。为将联邦学习扩展至监督学习模型之外,研究者提出了联邦强化学习以处理边缘计算系统中的序列决策问题。然而,现有联邦强化学习算法直接结合无模型强化学习与联邦学习,常导致样本复杂度高且缺乏理论保证。为解决上述挑战,我们提出一种新型联邦强化学习算法,首次将基于模型的强化学习与集成知识蒸馏有效融入联邦学习框架。具体而言,我们利用联邦学习与知识蒸馏为客户端构建动力学模型集成,随后仅通过该集成模型进行策略训练而无需与环境交互。此外,我们从理论上证明了所提算法具有单调改进保证。大量实验结果表明,在边缘计算场景下具有挑战性的连续控制基准环境中,该算法相比经典无模型联邦强化学习算法获得了显著更高的样本效率。实验结果还揭示了异构客户端数据与本地模型更新步数对联邦强化学习性能的重要影响,验证了理论分析所得结论。