We consider an asynchronous decentralized learning system, which consists of a network of connected devices trying to learn a machine learning model without any centralized parameter server. The users in the network have their own local training data, which is used for learning across all the nodes in the network. The learning method consists of two processes, evolving simultaneously without any necessary synchronization. The first process is the model update, where the users update their local model via a fixed number of stochastic gradient descent steps. The second process is model mixing, where the users communicate with each other via randomized gossiping to exchange their models and average them to reach consensus. In this work, we investigate the staleness criteria for such a system, which is a sufficient condition for convergence of individual user models. We show that for network scaling, i.e., when the number of user devices $n$ is very large, if the gossip capacity of individual users scales as $\Omega(\log n)$, we can guarantee the convergence of user models in finite time. Furthermore, we show that the bounded staleness can only be guaranteed by any distributed opportunistic scheme by $\Omega(n)$ scaling.
翻译:我们考虑一个异步去中心化学习系统,该系统由一组相互连接的设备构成,这些设备无需任何集中式参数服务器即可尝试学习机器学习模型。网络中的用户拥有各自的本地训练数据,这些数据用于整个网络所有节点的学习过程。该学习方法包含两个同时演进且无需任何同步的过程。第一个过程是模型更新,用户通过固定步数的随机梯度下降更新其本地模型。第二个过程是模型混合,用户通过随机 gossip 协议相互通信以交换模型,并对其进行平均以达成共识。在本研究中,我们探讨了此类系统的陈旧性判据,该判据是保证单个用户模型收敛的充分条件。我们证明,当网络规模扩大时,即用户设备数量 $n$ 极大时,若单个用户的 gossip 容量达到 $\Omega(\log n)$ 量级,则可确保用户模型在有限时间内收敛。此外,我们进一步说明,任何分布式机会主义方案仅能通过 $\Omega(n)$ 量级的扩展来保证有界陈旧性。