In this work we focus our attention on distributed optimization problems in the context where the communication time between the server and the workers is non-negligible. We obtain novel methods supporting bidirectional compression (both from the server to the workers and vice versa) that enjoy new state-of-the-art theoretical communication complexity for convex and nonconvex problems. Our bounds are the first that manage to decouple the variance/error coming from the workers-to-server and server-to-workers compression, transforming a multiplicative dependence to an additive one. Moreover, in the convex regime, we obtain the first bounds that match the theoretical communication complexity of gradient descent. Even in this convex regime, our algorithms work with biased gradient estimators, which is non-standard and requires new proof techniques that may be of independent interest. Finally, our theoretical results are corroborated through suitable experiments.
翻译:本文聚焦于服务器与工作节点间通信时间不可忽略场景下的分布式优化问题。我们提出支持双向压缩(既包含服务器到工作节点,也包含工作节点到服务器)的新型方法,在凸性与非凸性问题上实现了当前最优的理论通信复杂度。本文的界首次成功解耦了双向压缩(工作节点至服务器与服务器至工作节点)产生的方差/误差,将原本的乘法依赖关系转化为加法依赖关系。此外,在凸优化场景下,我们首次获得了与梯度下降理论通信复杂度相匹配的界。即便在凸场景下,我们的算法仍采用有偏梯度估计量,这种非标准做法需要全新的证明技术,该技术本身可能具有独立研究价值。最终,通过恰当的实验验证了理论结果的有效性。