This paper proposes the Doubly Compressed Momentum-assisted stochastic gradient tracking algorithm $\texttt{DoCoM}$ for communication-efficient decentralized optimization. The algorithm features two main ingredients to achieve a near-optimal sample complexity while allowing for communication compression. First, the algorithm tracks both the averaged iterate and stochastic gradient using compressed gossiping consensus. Second, a momentum step is incorporated for adaptive variance reduction with the local gradient estimates. We show that $\texttt{DoCoM}$ finds a near-stationary solution at all participating agents satisfying $\mathbb{E}[ \| \nabla f( \theta ) \|^2 ] = \mathcal{O}( 1 / T^{2/3} )$ in $T$ iterations, where $f(\theta)$ is a smooth (possibly non-convex) objective function. Notice that the proof is achieved via analytically designing a new potential function that tightly tracks the one-iteration progress of $\texttt{DoCoM}$. As a corollary, our analysis also established the linear convergence of $\texttt{DoCoM}$ to a global optimal solution for objective functions with the Polyak-{\L}ojasiewicz condition. Numerical experiments demonstrate that our algorithm outperforms several state-of-the-art algorithms in practice.
翻译:本文提出双压缩动量辅助随机梯度追踪算法$\texttt{DoCoM}$,用于实现通信高效的去中心化优化。该算法包含两大核心要素,在允许通信压缩的同时达到近最优样本复杂度。首先,算法利用压缩八卦一致性协议追踪平均迭代量与随机梯度。其次,通过引入动量步骤对局部梯度估计进行自适应方差缩减。我们证明,$\texttt{DoCoM}$能在$T$次迭代内使所有参与智能体达到满足$\mathbb{E}[ \| \nabla f( \theta ) \|^2 ] = \mathcal{O}( 1 / T^{2/3} )$的近驻点解,其中$f(\theta)$为光滑(可能非凸)目标函数。值得注意的是,该结论通过分析设计新型势函数实现,该函数可精确追踪$\texttt{DoCoM}$的单次迭代进展。作为推论,我们的分析还建立了$\texttt{DoCoM}$对满足Polyak-{\L}ojasiewicz条件的目标函数达到全局最优解的线性收敛性。数值实验表明,本算法在实践中优于多种现有最优算法。