Quantization (Alistarh et al., 2017) is an important (stochastic) compression technique that reduces the volume of transmitted bits during each communication round in distributed model training. Suresh et al. (2022) introduce correlated quantizers and show their advantages over independent counterparts by analyzing distributed SGD communication complexity. We analyze the forefront distributed non-convex optimization algorithm MARINA (Gorbunov et al., 2022) utilizing the proposed correlated quantizers and show that it outperforms the original MARINA and distributed SGD of Suresh et al. (2022) with regard to the communication complexity. We significantly refine the original analysis of MARINA without any additional assumptions using the weighted Hessian variance (Tyurin et al., 2022), and then we expand the theoretical framework of MARINA to accommodate a substantially broader range of potentially correlated and biased compressors, thus dilating the applicability of the method beyond the conventional independent unbiased compressor setup. Extensive experimental results corroborate our theoretical findings.
翻译:量化(Alistarh等人,2017)是一种重要的(随机)压缩技术,可减少分布式模型训练中每轮通信的比特传输量。Suresh等人(2022)引入关联量化器,并通过分析分布式SGD的通信复杂度证明了其相对于独立量化器的优势。我们利用提出的关联量化器分析了前沿分布式非凸优化算法MARINA(Gorbunov等人,2022),结果表明其在通信复杂度上优于原始MARINA及Suresh等人(2022)的分布式SGD。我们借助加权海森方差(Tyurin等人,2022),在不增加任何额外假设的前提下显著改进了MARINA的原始分析;进而扩展了MARINA的理论框架,使其能够容纳更广泛的潜在关联和有偏压缩器,从而将该方法的适用性扩展至传统独立无偏压缩器设置之外。大量实验结果验证了我们的理论发现。