Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods.
翻译:近期研究表明,在小规模数据集上,核机器通常能达到与深度神经网络(DNNs)相当甚至更优的性能。核机器与宽神经网络在特定情况下的等价性发现,进一步增强了人们对核机器的研究兴趣。然而,深度神经网络的关键特性在于能独立扩展模型规模与训练数据规模,而传统核机器的模型规模与数据规模存在耦合。这种耦合关系导致核机器难以在大规模数据上进行计算扩展。本文提出构建大规模通用核模型的新方法——该模型作为核机器的泛化形式,通过解耦模型与数据之间的关系,支持大规模数据集的训练。具体而言,我们提出基于投影对偶预条件随机梯度下降的EigenPro 3.0算法,并证明其能实现现有核方法无法达到的模型规模与数据规模扩展能力。