Decentralized federated learning (D-FL) allows clients to aggregate learning models locally, offering flexibility and scalability. Existing D-FL methods use gossip protocols, which are inefficient when not all nodes in the network are D-FL clients. This paper puts forth a new D-FL strategy, termed Route-and-Aggregate (R&A) D-FL, where participating clients exchange models with their peers through established routes (as opposed to flooding) and adaptively normalize their aggregation coefficients to compensate for communication errors. The impact of routing and imperfect links on the convergence of R&A D-FL is analyzed, revealing that convergence is minimized when routes with the minimum end-to-end packet error rates are employed to deliver models. Our analysis is experimentally validated through three image classification tasks and two next-word prediction tasks, utilizing widely recognized datasets and models. R&A D-FL outperforms the flooding-based D-FL method in terms of training accuracy by 35% in our tested 10-client network, and shows strong synergy between D-FL and networking. In another test with 10 D-FL clients, the training accuracy of R&A D-FL with communication errors approaches that of the ideal C-FL without communication errors, as the number of routing nodes (i.e., nodes that do not participate in the training of D-FL) rises to 28.
翻译:去中心化联邦学习允许客户端在本地聚合学习模型,提供了灵活性和可扩展性。现有的去中心化联邦学习方法采用Gossip协议,当网络中并非所有节点都是去中心化联邦学习客户端时,该协议效率低下。本文提出了一种新的去中心化联邦学习策略,称为路由聚合去中心化联邦学习,其中参与的客户端通过已建立的路由(而非泛洪)与对等方交换模型,并自适应地归一化其聚合系数以补偿通信误差。本文分析了路由和不完善链路对路由聚合去中心化联邦学习收敛性的影响,结果表明,当采用具有最小端到端分组错误率的路由来传递模型时,收敛性达到最优。我们的分析通过三个图像分类任务和两个下一词预测任务进行了实验验证,使用了广泛认可的数据集和模型。在我们测试的10客户端网络中,路由聚合去中心化联邦学习在训练准确率方面比基于泛洪的去中心化联邦学习方法高出35%,并显示出去中心化联邦学习与网络之间的强大协同效应。在另一个包含10个去中心化联邦学习客户端的测试中,随着路由节点(即不参与去中心化联邦学习训练的节点)数量增加到28个,存在通信误差的路由聚合去中心化联邦学习的训练准确率趋近于无通信误差的理想中心化联邦学习。