Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an important tool to speed-up the convergence of such iterative algorithms for training kernel models. However computing and storing a spectral preconditioner can be expensive which can lead to large computational and storage overheads, precluding the application of kernel methods to problems with large datasets. A Nystrom approximation of the spectral preconditioner is often cheaper to compute and store, and has demonstrated success in practical applications. In this paper we analyze the trade-offs of using such an approximated preconditioner. Specifically, we show that a sample of logarithmic size (as a function of the size of the dataset) enables the Nystrom-based approximated preconditioner to accelerate gradient descent nearly as well as the exact preconditioner, while also reducing the computational and storage overheads.
翻译:核方法是机器学习中一类流行的非线性预测模型。用于学习核模型的可扩展算法本质上需要采用迭代方式,但由于条件数不佳,收敛速度可能较慢。谱预条件处理是加速此类用于训练核模型的迭代算法收敛的重要工具。然而,计算和存储谱预条件器可能代价高昂,导致巨大的计算和存储开销,从而阻碍核方法应用于大规模数据集问题。对谱预条件器进行Nyström近似通常计算和存储成本更低,并且在实际应用中已展现出成功案例。本文分析了使用此类近似预条件器的权衡取舍。具体而言,我们证明:样本规模呈对数级别(与数据集规模相关)时,基于Nyström近似的预条件器能够几乎与精确预条件器一样有效加速梯度下降,同时降低计算和存储开销。