Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes--a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD) allowing the training of much larger models than was previously feasible, pushing the practical limits of kernel-based learning. We validate our algorithm, EigenPro4, across multiple datasets, demonstrating drastic training speed up over the existing methods while maintaining comparable or better classification accuracy.
翻译:传统核机器历来面临扩展到大规模数据集和模型尺寸的重大挑战——而这正是神经网络成功的关键要素。本文提出一种构建核机器的新方法,该方法能够随数据规模和模型规模高效扩展。我们的算法将延迟投影引入预条件随机梯度下降法(PSGD),使得训练比以往可行规模更大的模型成为可能,从而突破了基于核学习的实际应用边界。我们在多个数据集上验证了所提出的EigenPro4算法,结果表明在保持相当或更优分类精度的同时,该算法相比现有方法实现了训练速度的显著提升。