Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood. In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregressive process and adapting existing coupling techniques, we prove the geometric-moment contraction of high-dimensional SGD for constant learning rates, thereby establishing asymptotic stationarity of the iterates. Building on this, we derive the $q$-th moment convergence of SGD and ASGD for any $q\ge2$ in general $\ell^s$-norms, and, in particular, the $\ell^{\infty}$-norm that is frequently adopted in high-dimensional sparse or structured models. Furthermore, we provide sharp high-probability concentration analysis which entails the probabilistic bound of high-dimensional ASGD. Beyond closing a critical gap in SGD theory, our proposed framework offers a novel toolkit for analyzing a broad class of high-dimensional learning algorithms.
翻译:随机梯度下降(SGD)及其Ruppert-Polyak平均变体(ASGD)是现代大规模学习的核心,然而它们在高维设置下的理论性质却鲜为人知。本文为高维机制下的恒定学习率SGD和ASGD提供了严格的统计保证。我们的关键创新在于将高维时间序列的强大工具迁移至在线学习领域。具体而言,通过将SGD视为非线性自回归过程并采用现有的耦合技术,我们证明了恒定学习率下高维SGD的几何矩收缩性,从而确立了迭代的渐近平稳性。在此基础上,我们推导出SGD和ASGD在任意$q\ge2$阶一般$\ell^s$范数下的矩收敛性,特别包括高维稀疏或结构化模型中常采用的$\ell^{\infty}$范数。此外,我们提供了精确的高概率集中性分析,从而得到高维ASGD的概率界。除了填补SGD理论中的一个关键空白,我们提出的框架还为分析一大类高维学习算法提供了一套新颖的工具集。