Communication compression is a common technique in distributed optimization that can alleviate communication overhead by transmitting compressed gradients and model parameters. However, compression can introduce information distortion, which slows down convergence and incurs more communication rounds to achieve desired solutions. Given the trade-off between lower per-round communication costs and additional rounds of communication, it is unclear whether communication compression reduces the total communication cost. This paper explores the conditions under which unbiased compression, a widely used form of compression, can reduce the total communication cost, as well as the extent to which it can do so. To this end, we present the first theoretical formulation for characterizing the total communication cost in distributed optimization with communication compression. We demonstrate that unbiased compression alone does not necessarily save the total communication cost, but this outcome can be achieved if the compressors used by all workers are further assumed independent. We establish lower bounds on the communication rounds required by algorithms using independent unbiased compressors to minimize smooth convex functions and show that these lower bounds are tight by refining the analysis for ADIANA. Our results reveal that using independent unbiased compression can reduce the total communication cost by a factor of up to $\Theta(\sqrt{\min\{n, \kappa\}})$ when all local smoothness constants are constrained by a common upper bound, where $n$ is the number of workers and $\kappa$ is the condition number of the functions being minimized. These theoretical findings are supported by experimental results.
翻译:通信压缩是分布式优化中一种常见技术,可通过传输压缩后的梯度和模型参数来缓解通信开销。然而,压缩会引入信息失真,从而降低收敛速度并需要更多通信轮次才能达到预期解。考虑到每轮通信成本降低与通信轮次增加之间的权衡,目前尚不清楚通信压缩是否能减少总通信成本。本文探讨了无偏压缩(一种广泛使用的压缩形式)能够降低总通信成本的条件及其降低程度。为此,我们首次提出了用于刻画分布式优化中通信压缩总通信成本的理论公式。我们证明,单独使用无偏压缩并不一定能节省总通信成本,但如果进一步假设所有工作者使用的压缩器相互独立,则可实现这一效果。我们建立了使用独立无偏压缩器的算法在最小化光滑凸函数时所需通信轮次的下界,并通过改进ADIANA的分析表明这些下界是紧的。我们的结果表明,当所有局部光滑常数受限于共同上界时,使用独立无偏压缩可将总通信成本降低至多$\Theta(\sqrt{\min\{n, \kappa\}})$倍,其中$n$是工作者数量,$\kappa$是被优化函数的条件数。这些理论发现得到了实验结果的验证。