Neural kernels have drastically increased performance on diverse and nonstandard data modalities but require significantly more compute, which previously limited their application to smaller datasets. In this work, we address this by massively parallelizing their computation across many GPUs. We combine this with a distributed, preconditioned conjugate gradients algorithm to enable kernel regression at a large scale (i.e. up to five million examples). Using this approach, we study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset. Using data augmentation to expand the original CIFAR-10 training dataset by a factor of 20, we obtain a test accuracy of 91.2\% (SotA for a pure kernel method). Moreover, we explore neural kernels on other data modalities, obtaining results on protein and small molecule prediction tasks that are competitive with SotA methods.
翻译:神经核在多样化和非标准数据模态上的表现大幅提升,但需要显著更多的计算资源,这限制了其此前仅适用于较小规模数据集。在本工作中,我们通过在多个GPU上大规模并行化计算来解决这一问题。我们将其与分布式预处理共轭梯度算法相结合,实现了大规模(即高达五百万个样本)的核回归。使用该方法,我们在CIFAR-5m数据集上研究了多个神经核跨越多个数量级的缩放规律。通过数据增强将原始CIFAR-10训练数据集扩展20倍,我们获得了91.2%的测试准确率(纯核方法中的当前最优)。此外,我们探索了神经核在其他数据模态上的应用,在蛋白质和小分子预测任务上获得了与当前最优方法相竞争的结果。