This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter. We then argue that this result is coming from a worst-case analysis, and we provide a refined optimization-dependent generalization bound for general convex functions. This new bound reveals that the choice of graph can in fact improve the worst-case bound in certain regimes, and that surprisingly, a poorly-connected graph can even be beneficial for generalization.
翻译:本文基于算法稳定性理论,为去中心化随机梯度下降算法提出了一种新的泛化误差分析框架。所得结论彻底修正了近期一系列研究观点——这些研究认为去中心化结构会加剧算法不稳定性,且连通性较差的通信拓扑图会对泛化性能产生负面影响。与之相反,我们证明在凸函数、强凸函数及非凸函数场景下,D-SGD始终能够获得与经典SGD相当的泛化界,这表明拓扑图的选择并不影响理论泛化性能。进一步,我们指出该结论源自最坏情况分析,并针对一般凸函数提出了精细化且依赖优化过程的泛化界。这一新泛化界揭示出:在某些参数区域内,拓扑图的选择确实能够改进最坏情况边界;令人惊讶的是,连通性较差的拓扑图甚至可能对泛化性能产生积极影响。