Federated learning is a privacy-focused approach towards machine learning where models are trained on client devices with locally available data and aggregated at a central server. However, the dependence on a single central server is challenging in the case of a large number of clients and even poses the risk of a single point of failure. To address these critical limitations of scalability and fault-tolerance, we present a distributed approach to federated learning comprising multiple servers with inter-server communication capabilities. While providing a fully decentralized approach, the designed framework retains the core federated learning structure where each server is associated with a disjoint set of clients with server-client communication capabilities. We propose a novel DFL (Distributed Federated Learning) algorithm which uses alternating periods of local training on the client data followed by global training among servers. We show that the DFL algorithm, under a suitable choice of parameters, ensures that all the servers converge to a common model value within a small tolerance of the ideal model, thus exhibiting effective integration of local and global training models. Finally, we illustrate our theoretical claims through numerical simulations.
翻译:联邦学习是一种注重隐私的机器学习方法,其中模型在客户端设备上使用本地可用数据进行训练,并在中央服务器处聚合。然而,在客户端数量庞大的情况下,对单一中央服务器的依赖具有挑战性,甚至存在单点故障的风险。为了解决可扩展性和容错性方面的这些关键限制,我们提出了一种分布式联邦学习方法,该方法包含多个具备服务器间通信能力的服务器。在提供完全去中心化方法的同时,所设计的框架保留了联邦学习的核心结构,即每个服务器与一组互不相交的客户端相关联,并具备服务器-客户端通信能力。我们提出了一种新颖的DFL(分布式联邦学习)算法,该算法交替进行客户端数据的本地训练期和服务器间的全局训练期。我们证明,在参数选择合适的情况下,DFL算法能确保所有服务器收敛到一个共同的模型值,该值在理想模型的一个小容差范围内,从而有效整合了本地与全局训练模型。最后,我们通过数值模拟验证了我们的理论主张。