Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods.
翻译:近期研究表明,核方法在小规模数据集上的性能往往与深度神经网络相当或更优。核方法与宽神经网络在特定条件下的等价性发现进一步提升了其研究价值。然而,深度神经网络的关键特性在于能够独立扩展模型规模与训练数据规模,而传统核方法的模型规模始终受限于数据规模。这种耦合关系导致核方法在大规模数据场景下面临严峻的计算挑战。本文提出构建大规模通用核模型的有效路径——这种核方法泛化形式通过解耦模型与数据,使得大规模数据集训练成为可能。具体而言,我们提出基于投影对偶预条件随机梯度下降的EigenPro 3.0算法,实验表明该方法能够在现有核方法无法企及的模型规模与数据规模下实现有效扩展。