Randomized subspace methods reduce per-iteration cost; however, in nonconvex optimization, most analyses are expectation-based, and high-probability bounds remain scarce even under sub-Gaussian noise. We first prove that randomized subspace SGD (RS-SGD) admits a high-probability convergence bound under sub-Gaussian noise, achieving the same order of oracle complexity as prior in-expectation results. Motivated by the prevalence of heavy-tailed gradients in modern machine learning, we then propose randomized subspace normalized SGD (RS-NSGD), which integrates direction normalization into subspace updates. Assuming the noise has bounded $p$-th moments, we establish both in-expectation and high-probability convergence guarantees, and show that RS-NSGD can achieve better oracle complexity than full-dimensional normalized SGD.
翻译:随机子空间方法降低了每次迭代的计算成本;然而,在非凸优化中,大多数分析是基于期望的,即使在亚高斯噪声下,高概率收敛界仍然稀缺。我们首先证明了在亚高斯噪声下,随机子空间SGD(RS-SGD)具有高概率收敛界,其所需的oracle复杂度阶数与先前的期望结果相同。受现代机器学习中重尾梯度普遍存在的启发,我们随后提出了随机子空间归一化SGD(RS-NSGD),它将方向归一化整合到子空间更新中。假设噪声具有有界的$p$阶矩,我们建立了期望和高概率收敛保证,并证明RS-NSGD能够实现比全维度归一化SGD更优的oracle复杂度。