With the increasing demand for large-scale training of machine learning models, fully decentralized optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each worker maintains a local estimate of the optimal parameter vector, and iteratively updates it by waiting and averaging all estimates obtained from its neighbors, and then corrects it on the basis of its local dataset. However, the synchronization phase is sensitive to stragglers. An efficient way to mitigate this effect is to consider asynchronous updates, where each worker computes stochastic gradients and communicates with other workers at its own pace. Unfortunately, fully asynchronous updates suffer from staleness of stragglers' parameters. To address these limitations, we propose a fully decentralized algorithm DSGD-AAU with adaptive asynchronous updates via adaptively determining the number of neighbor workers for each worker to communicate with. We show that DSGD-AAU achieves a linear speedup for convergence and demonstrate its effectiveness via extensive experiments.
翻译:随着机器学习模型大规模训练需求的日益增长,完全去中心化的优化方法近期被提出作为流行参数服务器框架的替代方案。在此范式中,每个工作节点维护一个最优参数向量的局部估计值,并通过等待并聚合来自其所有邻居节点的估计值进行迭代更新,随后基于其本地数据集进行修正。然而,同步阶段对慢节点极为敏感。缓解该影响的有效方法是采用异步更新策略,即每个工作节点以自身节奏计算随机梯度并与其他节点通信。遗憾的是,完全异步更新会因慢节点的参数陈旧性而受到影响。为克服这些局限,我们提出一种完全去中心化算法DSGD-AAU,该算法通过自适应确定每个工作节点需通信的邻居节点数量来实现自适应异步更新。我们证明DSGD-AAU能够实现线性加速收敛,并通过大量实验验证了其有效性。