In this paper, we propose a novel kernel stochastic gradient descent (SGD) algorithm for large-scale supervised learning with general losses. Compared to traditional kernel SGD, our algorithm improves efficiency and scalability through an innovative regularization strategy. By leveraging the infinite series expansion of spherical radial basis functions, this strategy projects the stochastic gradient onto a finite-dimensional hypothesis space, which is adaptively scaled according to the bias-variance trade-off, thereby enhancing generalization performance. Based on a new estimation of the spectral structure of the kernel-induced covariance operator, we develop an analytical framework that unifies optimization and generalization analyses. We prove that both the last iterate and the suffix average converge at minimax-optimal rates, and we further establish optimal strong convergence in the reproducing kernel Hilbert space. Our framework accommodates a broad class of classical loss functions, including least-squares, Huber, and logistic losses. Moreover, the proposed algorithm significantly reduces computational complexity and achieves optimal storage complexity by incorporating coordinate-wise updates from linear SGD, thereby avoiding the costly pairwise operations typical of kernel SGD and enabling efficient processing of streaming data. Finally, extensive numerical experiments demonstrate the efficiency of our approach.
翻译:本文提出一种新颖的核随机梯度下降算法,用于大规模监督学习中的一般损失函数优化。相较于传统核随机梯度下降方法,本算法通过创新的正则化策略提升了计算效率与可扩展性。该策略利用球面径向基函数的无穷级数展开,将随机梯度投影到有限维假设空间,并根据偏差-方差权衡进行自适应缩放,从而提升泛化性能。基于对核诱导协方差算子谱结构的新估计,我们建立了统一优化与泛化分析的理论框架。我们证明了算法最终迭代点及其后缀平均均能以极小极大最优速率收敛,并进一步在再生核希尔伯特空间中建立了最优强收敛性。该框架适用于包括最小二乘损失、Huber损失与逻辑损失在内的广泛经典损失函数。此外,通过引入线性随机梯度下降的坐标更新机制,所提算法显著降低了计算复杂度,并实现了最优存储复杂度,从而避免了传统核随机梯度下降中代价高昂的成对运算,实现了对流式数据的高效处理。最后,大量数值实验验证了本方法的有效性。