Federated Reinforcement Learning (FedRL) encourages distributed agents to learn collectively from each other's experience to improve their performance without exchanging their raw trajectories. The existing work on FedRL assumes that all participating agents are homogeneous, which requires all agents to share the same policy parameterization (e.g., network architectures and training configurations). However, in real-world applications, agents are often in disagreement about the architecture and the parameters, possibly also because of disparate computational budgets. Because homogeneity is not given in practice, we introduce the problem setting of Federated Reinforcement Learning with Heterogeneous And bLack-box agEnts (FedRL-HALE). We present the unique challenges this new setting poses and propose the Federated Heterogeneous Q-Learning (FedHQL) algorithm that principally addresses these challenges. We empirically demonstrate the efficacy of FedHQL in boosting the sample efficiency of heterogeneous agents with distinct policy parameterization using standard RL tasks.
翻译:联邦强化学习(FedRL)鼓励分布式智能体通过相互学习集体经验来提升性能,而无需交换原始轨迹。现有FedRL工作假设所有参与智能体都是同构的,即所有智能体共享相同的策略参数化(例如网络架构和训练配置)。然而,在实际应用中,智能体常在架构和参数上存在分歧,可能还因计算预算差异所致。由于同构性在实践中难以保证,我们提出了异构黑盒智能体联邦强化学习(FedRL-HALE)的问题设定。我们阐述了这一新设定带来的独特挑战,并提出了联邦异构Q学习(FedHQL)算法,该算法主要解决了这些挑战。我们通过标准强化学习任务,实证证明了FedHQL在提升具有不同策略参数化的异构智能体样本效率方面的有效性。