In the era of increasing privacy concerns and demand for personalized experiences, traditional Reinforcement Learning with Human Feedback (RLHF) frameworks face significant challenges due to their reliance on centralized data. We introduce Federated Reinforcement Learning with Human Feedback (FedRLHF), a novel framework that decentralizes the RLHF process. FedRLHF enables collaborative policy learning across multiple clients without necessitating the sharing of raw data or human feedback, thereby ensuring robust privacy preservation. Leveraging federated reinforcement learning, each client integrates human feedback locally into their reward functions and updates their policies through personalized RLHF processes. We establish rigorous theoretical foundations for FedRLHF, providing convergence guarantees, and deriving sample complexity bounds that scale efficiently with the number of clients. Empirical evaluations on the MovieLens and IMDb datasets demonstrate that FedRLHF not only preserves user privacy but also achieves performance on par with centralized RLHF, while enhancing personalization across diverse client environments.
翻译:在隐私关切日益增长和对个性化体验需求不断提升的时代,传统的基于人类反馈的强化学习(RLHF)框架因其对中心化数据的依赖而面临重大挑战。我们提出了联邦式基于人类反馈的强化学习(FedRLHF),这是一种将RLHF过程去中心化的新型框架。FedRLHF使得多个客户端能够在不共享原始数据或人类反馈的情况下进行协作策略学习,从而确保强大的隐私保护。通过利用联邦强化学习,每个客户端将人类反馈本地集成到其奖励函数中,并通过个性化的RLHF过程更新其策略。我们为FedRLHF建立了严格的理论基础,提供了收敛性保证,并推导出随客户端数量高效扩展的样本复杂度界限。在MovieLens和IMDb数据集上的实证评估表明,FedRLHF不仅保护了用户隐私,而且实现了与中心化RLHF相当的性能,同时在多样化的客户端环境中增强了个性化程度。