Truncated Kernel Stochastic Gradient Descent on Spheres

Inspired by the structure of spherical harmonics, we propose the truncated kernel stochastic gradient descent (T-kernel SGD) algorithm with a least-square loss function for spherical data fitting. T-kernel SGD employs a "truncation" operation, enabling the application of series-based kernels function in stochastic gradient descent, thereby avoiding the difficulties of finding suitable closed-form kernel functions in high-dimensional spaces. In contrast to traditional kernel SGD, T-kernel SGD is more effective in balancing bias and variance by dynamically adjusting the hypothesis space during iterations. The most significant advantage of the proposed algorithm is that it can achieve theoretically optimal convergence rates using a constant step size (independent of the sample size) while overcoming the inherent saturation problem of kernel SGD. Additionally, we leverage the structure of spherical polynomials to derive an equivalent T-kernel SGD, significantly reducing storage and computational costs compared to kernel SGD. Typically, T-kernel SGD requires only $\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$ computational complexity and $\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ storage to achieve optimal rates for the d-dimensional sphere, where $0<\epsilon<\frac{1}{2}$ can be arbitrarily small if the optimal fitting or the underlying space possesses sufficient regularity. This regularity is determined by the smoothness parameter of the objective function and the decaying rate of the eigenvalues of the integral operator associated with the kernel function, both of which reflect the difficulty of the estimation problem. Our main results quantitatively characterize how this prior information influences the convergence of T-kernel SGD. The numerical experiments further validate the theoretical findings presented in this paper.

翻译：受球谐函数结构启发，我们针对球面数据拟合问题，提出基于最小二乘损失函数的截断核随机梯度下降算法。该算法通过引入"截断"操作，使级数型核函数能够应用于随机梯度下降过程，从而避免在高维空间中寻找合适闭式核函数的困难。与传统核随机梯度下降相比，本算法通过在迭代过程中动态调整假设空间，能更有效地平衡偏差与方差。所提算法最显著的优势在于：使用与样本量无关的恒定步长即可达到理论最优收敛速率，同时克服核随机梯度下降固有的饱和问题。此外，我们利用球面多项式结构推导出等效的截断核随机梯度下降算法，较传统核方法显著降低了存储与计算成本。对于d维球面问题，该算法通常仅需$\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$计算复杂度与$\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$存储量即可达到最优速率，其中$0<\epsilon<\frac{1}{2}$在最优拟合或底层空间具有足够正则性时可任意小。此正则性由目标函数光滑性参数与核函数对应积分算子特征值衰减速率共同决定，两者均反映估计问题的难度。我们的主要结果定量刻画了此类先验信息如何影响截断核随机梯度下降的收敛过程。数值实验进一步验证了本文的理论结论。