Decentralized training is often regarded as inferior to centralized training because the consensus errors between workers are thought to undermine convergence and generalization, even with homogeneous data distributions. This work challenges this view by introducing decentralized SGD with Adaptive Consensus (DSGD-AC), which intentionally preserves non-vanishing consensus errors through a time-dependent scaling mechanism. We prove that these errors are not random noise but systematically align with the dominant Hessian subspace, acting as structured perturbations that guide optimization toward flatter minima. Across image classification and machine translation benchmarks, DSGD-AC consistently surpasses both standard DSGD and centralized SGD in test accuracy and solution flatness. Together, these results establish consensus errors as a useful implicit regularizer and open a new perspective on the design of decentralized learning algorithms.
翻译:去中心化训练常被认为劣于中心化训练,因为即使数据分布同质,工作节点间的共识误差也被认为会损害收敛性与泛化能力。本研究通过提出带自适应共识的去中心化随机梯度下降法(DSGD-AC)挑战了这一观点,该方法通过时间相关的缩放机制有意保持非零的共识误差。我们证明这些误差并非随机噪声,而是系统性地对齐于主导海森子空间,作为结构化扰动引导优化趋向更平坦的极小值。在图像分类与机器翻译基准测试中,DSGD-AC在测试精度与解平坦度方面持续超越标准DSGD与中心化SGD。这些结果共同确立了共识误差作为一种有效的隐式正则化器,并为去中心化学习算法的设计开辟了新视角。