In machine learning and neural network optimization, algorithms like incremental gradient, and shuffle SGD are popular due to minimizing the number of cache misses and good practical convergence behavior. However, their optimization properties in theory, especially for non-convex smooth functions, remain incompletely explored. This paper delves into the convergence properties of SGD algorithms with arbitrary data ordering, within a broad framework for non-convex smooth functions. Our findings show enhanced convergence guarantees for incremental gradient and single shuffle SGD. Particularly if $n$ is the training set size, we improve $n$ times the optimization term of convergence guarantee to reach accuracy $\varepsilon$ from $O(n / \varepsilon)$ to $O(1 / \varepsilon)$.
翻译:在机器学习和神经网络优化中,增量梯度与随机洗牌SGD等算法因最小化缓存缺失次数及良好的实际收敛性能而广受欢迎。然而,其理论优化性质(尤其针对非凸光滑函数)尚未得到充分探索。本文在非凸光滑函数的广义框架下,深入研究了任意数据排序下SGD算法的收敛特性。研究表明,增量梯度与单次洗牌SGD的收敛保证得到增强。特别地,当训练集大小为$n$时,我们将达到精度$\varepsilon$的优化项收敛保证从$O(n / \varepsilon)$改进$n$倍至$O(1 / \varepsilon)$。