With the increasing demand for large-scale training of machine learning models, fully decentralized optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each worker maintains a local estimate of the optimal parameter vector, and iteratively updates it by waiting and averaging all estimates obtained from its neighbors, and then corrects it on the basis of its local dataset. However, the synchronization phase is sensitive to stragglers. An efficient way to mitigate this effect is to consider asynchronous updates, where each worker computes stochastic gradients and communicates with other workers at its own pace. Unfortunately, fully asynchronous updates suffer from staleness of the stragglers' parameters. To address these limitations, we propose a fully decentralized algorithm DSGD-AAU with adaptive asynchronous updates via adaptively determining the number of neighbor workers for each worker to communicate with. We show that DSGD-AAU achieves a linear speedup for convergence (i.e., convergence performance increases linearly with respect to the number of workers). Experimental results on a suite of datasets and deep neural network models are provided to verify our theoretical results.
翻译:随着大规模机器学习模型训练需求的日益增长,全去中心化优化方法近期被倡导作为流行参数服务器框架的替代方案。在该范式中,每个工作节点维护最优参数向量的本地估计,通过等待并聚合从邻居节点获得的所有估计值进行迭代更新,然后基于本地数据集进行修正。然而,同步阶段对掉队者非常敏感。缓解此问题的有效途径是采用异步更新,即每个工作节点以自身节奏计算随机梯度并与其他节点通信。不幸的是,完全异步更新会因掉队者参数的陈旧性而性能受损。为解决这些局限,我们提出了一种全去中心化算法DSGD-AAU,通过自适应确定每个工作节点需通信的邻居工作节点数量实现自适应异步更新。理论分析表明,DSGD-AAU可实现收敛的线性加速(即收敛性能随工作节点数量线性提升)。我们在多个数据集和深度神经网络模型上的实验结果验证了理论发现。