This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter. We then argue that this result is coming from a worst-case analysis, and we provide a refined data-dependent generalization bound for general convex functions. This new bound reveals that the choice of graph can in fact improve the worst-case bound in certain regimes, and that surprisingly, a poorly-connected graph can even be beneficial.
翻译:本文基于算法稳定性,提出了一种针对去中心化随机梯度下降(D-SGD)的泛化误差新分析。所得结果推翻了近期一系列研究,这些研究认为去中心化会导致不稳定性增加,且通信图连接性差会对泛化产生不利影响。相反,我们证明,对于凸函数、强凸函数和非凸函数,D-SGD始终能够恢复与经典SGD相似的泛化界,这表明图的选择并不重要。随后,我们论证该结果源于最坏情况分析,并针对一般凸函数提出了一个精细化的数据依赖泛化界。这一新界揭示,在特定条件下,图的选择实际上可以改善最坏情况界,且令人惊讶的是,连接性差的图甚至可能是有益的。