We consider the problem of federated offline reinforcement learning (RL), a scenario under which distributed learning agents must collaboratively learn a high-quality control policy only using small pre-collected datasets generated according to different unknown behavior policies. Naively combining a standard offline RL approach with a standard federated learning approach to solve this problem can lead to poorly performing policies. In response, we develop the Federated Ensemble-Directed Offline Reinforcement Learning Algorithm (FEDORA), which distills the collective wisdom of the clients using an ensemble learning approach. We develop the FEDORA codebase to utilize distributed compute resources on a federated learning platform. We show that FEDORA significantly outperforms other approaches, including offline RL over the combined data pool, in various complex continuous control environments and real world datasets. Finally, we demonstrate the performance of FEDORA in the real-world on a mobile robot.
翻译:我们考虑联邦离线强化学习问题,该场景下分布式学习代理必须仅利用根据不同未知行为策略生成的小型预收集数据集协作学习高质量控制策略。将标准离线强化学习方法与标准联邦学习方法简单结合来解决该问题可能导致性能不佳的策略。为此,我们提出联邦集成导向的离线强化学习算法(FEDORA),该算法通过集成学习方法提炼客户端的集体智慧。我们开发了FEDORA代码库以利用联邦学习平台上的分布式计算资源。实验表明,在多种复杂连续控制环境和真实世界数据集中,FEDORA显著优于包括离线强化学习(基于合并数据池)在内的其他方法。最后,我们在真实世界的移动机器人上验证了FEDORA的性能。