Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For strongly convex problems, communication efficiency is mainly determined by the condition number \(κ=L/μ\) and the network spectral gap \(1-β\). Although deterministic decentralized methods can simultaneously achieve accelerated \(\sqrtκ\) and \(1/\sqrt{1-β}\) dependences, no existing stochastic method attains both improvements at once. In this paper, we propose \emph{Multi-Gossip Accelerated DSGD} (MG-ADSGD), a decentralized stochastic algorithm that combines Nesterov-type primal--dual extrapolation with multi-round fast gossip averaging. The key idea is to couple the gossip depth with the mini-batch size so that additional communication rounds simultaneously improve consensus accuracy and reduce gradient variance. We show that MG-ADSGD achieves the communication complexity \[ \widetilde{\mathcal O}\!\left( \frac{σ^2}{μnε}\log\frac{1}ε + \sqrt{\fracκ{1-β}}\log\frac{1}ε \right), \] where \(ε\) denotes the target accuracy, \(n\) is the number of nodes, and \(σ^2\) is the gradient variance. To the best of our knowledge, this bound yields the best currently available communication complexity for decentralized stochastic strongly convex optimization, up to logarithmic factors that are independent of $ε$.
翻译:去中心化随机优化是在网络中进行大规模学习的基本范式,其中智能体仅与邻居通信,无需中央协调器。对于强凸问题,通信效率主要由条件数 \(κ=L/μ\) 和网络谱间隙 \(1-β\) 决定。尽管确定性去中心化方法可以同时实现加速的 \(\sqrtκ\) 和 \(1/\sqrt{1-β}\) 依赖关系,但现有随机方法未能同时获得这两方面的改进。本文提出了一种去中心化随机算法——多轮八卦加速去中心化随机梯度下降(MG-ADSGD),该算法将奈斯特罗夫型原始-对偶外推与多轮快速八卦平均相结合。其关键思想是将八卦深度与小批量大小耦合,使得额外的通信轮次同时提高一致性精度并降低梯度方差。我们证明,MG-ADSGD 实现了通信复杂度 \[ \widetilde{\mathcal O}\!\left( \frac{σ^2}{μnε}\log\frac{1}ε + \sqrt{\fracκ{1-β}}\log\frac{1}ε \right), \] 其中 \(ε\) 表示目标精度,\(n\) 是节点数量,\(σ^2\) 是梯度方差。据我们所知,该界给出了去中心化随机强凸优化中当前最佳的可达到通信复杂度(仅忽略与 \(ε\) 无关的对数因子)。