Standard infinite-width limits of neural networks sacrifice the ability for intermediate layers to learn representations from data. Recent work (A theory of representation learning gives a deep generalisation of kernel methods, Yang et al. 2023) modified the Neural Network Gaussian Process (NNGP) limit of Bayesian neural networks so that representation learning is retained. Furthermore, they found that applying this modified limit to a deep Gaussian process gives a practical learning algorithm which they dubbed the deep kernel machine (DKM). However, they only considered the simplest possible setting: regression in small, fully connected networks with e.g. 10 input features. Here, we introduce convolutional deep kernel machines. This required us to develop a novel inter-domain inducing point approximation, as well as introducing and experimentally assessing a number of techniques not previously seen in DKMs, including analogues to batch normalisation, different likelihoods, and different types of top-layer. The resulting model trains in roughly 77 GPU hours, achieving around 99% test accuracy on MNIST, 72% on CIFAR-100, and 92.7% on CIFAR-10, which is SOTA for kernel methods.
翻译:标准神经网络在无限宽度极限下会牺牲中间层从数据中学习表征的能力。近期工作(《表征学习理论:核方法的深度泛化》,Yang等人,2023)对贝叶斯神经网络的神经网络高斯过程(NNGP)极限进行了修改,使得表征学习得以保留。此外,他们发现将该修改后的极限应用于深度高斯过程,可得到一种实用的学习算法,并将其命名为深度核机(DKM)。然而,该工作仅考虑了最简单的场景——在小规模全连接网络(如10个输入特征)中进行回归。本文引入卷积深度核机。为此,我们需开发一种新颖的跨域诱导点近似方法,并引入和实验评估了多项此前在DKM中未出现的技术,包括批归一化的等价变体、不同似然函数以及不同类型顶层。所提出的模型训练耗时约77 GPU小时,在MNIST上达到约99%的测试准确率,在CIFAR-100上达到72%,在CIFAR-10上达到92.7%,为当前核方法的最优水平。