Scale-Robust Timely Asynchronous Decentralized Learning

We consider an asynchronous decentralized learning system, which consists of a network of connected devices trying to learn a machine learning model without any centralized parameter server. The users in the network have their own local training data, which is used for learning across all the nodes in the network. The learning method consists of two processes, evolving simultaneously without any necessary synchronization. The first process is the model update, where the users update their local model via a fixed number of stochastic gradient descent steps. The second process is model mixing, where the users communicate with each other via randomized gossiping to exchange their models and average them to reach consensus. In this work, we investigate the staleness criteria for such a system, which is a sufficient condition for convergence of individual user models. We show that for network scaling, i.e., when the number of user devices $n$ is very large, if the gossip capacity of individual users scales as $\Omega(\log n)$, we can guarantee the convergence of user models in finite time. Furthermore, we show that the bounded staleness can only be guaranteed by any distributed opportunistic scheme by $\Omega(n)$ scaling.

翻译：我们考虑一个异步去中心化学习系统，该系统由一组相互连接的设备构成，这些设备无需任何集中式参数服务器即可尝试学习机器学习模型。网络中的用户拥有各自的本地训练数据，这些数据用于整个网络所有节点的学习过程。该学习方法包含两个同时演进且无需任何同步的过程。第一个过程是模型更新，用户通过固定步数的随机梯度下降更新其本地模型。第二个过程是模型混合，用户通过随机 gossip 协议相互通信以交换模型，并对其进行平均以达成共识。在本研究中，我们探讨了此类系统的陈旧性判据，该判据是保证单个用户模型收敛的充分条件。我们证明，当网络规模扩大时，即用户设备数量 $n$ 极大时，若单个用户的 gossip 容量达到 $\Omega(\log n)$ 量级，则可确保用户模型在有限时间内收敛。此外，我们进一步说明，任何分布式机会主义方案仅能通过 $\Omega(n)$ 量级的扩展来保证有界陈旧性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日