This paper presents a new generalization error analysis for the Decentralized Stochastic Gradient Descent (D-SGD) algorithm based on algorithmic stability. The obtained results largely improve upon state-of-the-art results, and even invalidate their claims that the communication graph has a detrimental effect on generalization. For instance, we show that in convex settings, D-SGD has the same generalization bounds as the classical SGD algorithm, no matter the choice of graph. We exhibit that this counter-intuitive result comes from considering the average of local parameters, which hides a final global averaging step incompatible with the decentralized scenario. In light of this observation, we advocate to analyze the supremum over local parameters and show that in this case, the graph does have an impact on the generalization. Unlike prior results, our analysis yields non-vacuous bounds even for non-connected graphs.
翻译:本文基于算法稳定性,对分布式随机梯度下降(D-SGD)算法提出了一种新的泛化误差分析。所得结果显著优于现有最优结果,甚至推翻了其关于通信图对泛化产生负面影响的论断。例如,我们证明在凸优化设置下,无论选择何种图结构,D-SGD均具有与经典SGD算法相同的泛化界。我们揭示这一反直觉结果源于对局部参数取平均的做法——这隐含了与分布式场景不兼容的最终全局平均步骤。基于这一观察,我们主张分析局部参数的上确界,并证明在此情况下图结构确实对泛化产生影响。与现有研究不同,即使对于非连通图,我们的分析也能得到非平凡界。