A core issue in multi-agent federated reinforcement learning is defining how to aggregate insights from multiple agents. This is commonly done by taking the average of each participating agent's model weights into one common model (FedAvg). We instead propose FedFormer, a novel federation strategy that utilizes Transformer Attention to contextually aggregate embeddings from models originating from different learner agents. In so doing, we attentively weigh the contributions of other agents with respect to the current agent's environment and learned relationships, thus providing a more effective and efficient federation. We evaluate our methods on the Meta-World environment and find that our approach yields significant improvements over FedAvg and non-federated Soft Actor-Critic single-agent methods. Our results compared to Soft Actor-Critic show that FedFormer achieves higher episodic return while still abiding by the privacy constraints of federated learning. Finally, we also demonstrate improvements in effectiveness with increased agent pools across all methods in certain tasks. This is contrasted by FedAvg, which fails to make noticeable improvements when scaled.
翻译:多智能体联邦强化学习的核心问题在于如何聚合来自多个智能体的见解。常用的方法是将每个参与智能体的模型权重取平均,合并成一个通用模型(FedAvg)。我们提出了一种名为FedFormer的新型联邦策略,该方法利用Transformer注意力机制,从不同学习智能体的模型中提取嵌入表示并进行情境化聚合。通过这种方式,我们根据当前智能体的环境及学习到的关系,有注意力地权衡其他智能体的贡献,从而实现更高效且有效的联邦学习。我们在Meta-World环境中评估了该方法,发现相较于FedAvg和非联邦的Soft Actor-Critic单智能体方法,我们的方法取得了显著改进。与Soft Actor-Critic相比,FedFormer在遵守联邦学习隐私约束的前提下,实现了更高的回合回报。最后,我们还证明在某些任务中,增加智能体池规模能够提升所有方法的有效性,而FedAvg在扩展时未能取得显著改进。