Federated and decentralized machine learning leverage end-user devices for privacy-preserving training of models at lower operating costs than within a data center. In a round of Federated Learning (FL), a random sample of participants trains locally, then a central server aggregates the local models to produce a single model for the next round. In a round of Decentralized Learning (DL), all participants train locally and then aggregate with their immediate neighbors, resulting in many local models with residual variance between them. On the one hand, FL's sampling and lower model variance provides lower communication costs and faster convergence. On the other hand, DL removes the need for a central server and distributes the communication costs more evenly amongst nodes, albeit at a larger total communication cost and slower convergence. In this paper, we present MoDeST: Mostly-Consistent Decentralized Sampling Training. MoDeST implements decentralized sampling in which a random subset of nodes is responsible for training and aggregation every round: this provides the benefits of both FL and DL without their traditional drawbacks. Our evaluation of MoDeST on four common learning tasks: (i) confirms convergence as fast as FL, (ii) shows a 3x-14x reduction in communication costs compared to DL, and (iii) demonstrates that MoDeST quickly adapts to nodes joining, leaving, or failing, even when 80% of all nodes become unresponsive.
翻译:联邦学习与去中心化机器学习利用终端设备进行隐私保护模型训练,其运行成本低于数据中心。在联邦学习的每轮训练中,随机抽取的部分参与者进行本地训练,然后中央服务器聚合本地模型以生成下一轮的单一模型。而在去中心化学习的每轮训练中,所有参与者进行本地训练并与邻居节点直接聚合,最终产生多个存在残差方差的本地模型。一方面,联邦学习的采样机制与较低模型方差带来了更低的通信成本和更快的收敛速度;另一方面,去中心化学习消除了对中央服务器的依赖,并将通信成本更均匀地分布在节点间,但代价是更高的总通信成本与更慢的收敛速度。本文提出MoDeST:高一致性去中心化采样训练。MoDeST实现了一种去中心化采样机制,每轮仅由随机节点子集负责训练与聚合,从而同时获得联邦学习与去中心化学习的优势,而无需承受其传统缺陷。我们在四项常见学习任务中对MoDeST进行评估:(i)验证其收敛速度与联邦学习相当,(ii)相较去中心化学习实现3倍至14倍的通信成本降低,(iii)证明即使80%的节点失去响应,MoDeST也能快速适应节点的加入、退出或失效。