Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods.
翻译:近期研究表明,核方法在小数据集上的表现常与深度神经网络相当或更优。核方法等价于特定条件下宽神经网络的发现,进一步增强了其研究价值。然而,深度神经网络的关键特征在于能够独立扩展模型规模和训练数据量,而传统核方法的模型规模与数据量存在耦合关系。这种耦合使得扩展核方法至大规模数据面临计算挑战。本文提出构建大规模通用核模型的前进方向——该模型作为核方法的泛化形式,解除了模型与数据的耦合关系,支持在大规模数据集上的训练。具体而言,我们提出基于投影对偶预条件SGD的EigenPro 3.0算法,并证明其能够扩展到现有核方法无法实现的模型规模与数据规模。