In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite data set with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We verify our theoretical result with numerical experiments and additionally investigate the effectiveness of the algorithm on MNIST and CIFAR-10.
翻译:在实践中,深度神经网络通常能够轻松插值其训练数据。为理解这一现象,许多研究致力于量化神经网络架构的记忆容量:即该架构能够插值任意位置分布且任意标签分配的样本点数量的上限。然而对于现实世界数据,人们直观预期存在良性结构,使得插值在远小于记忆容量理论值所需的网络规模下即可实现。本文通过采用实例特定的视角研究插值问题。我们提出一种简单的随机算法,在给定固定有限二分类数据集的情况下,该算法能够以高概率在多项式时间内构建出可插值的三层神经网络。所需参数量与两个类别的几何特性及其相互空间分布密切相关。由此获得的保证与样本数量无关,从而突破了最坏情况记忆容量的理论界限。我们通过数值实验验证了理论结果,并进一步探究了该算法在MNIST和CIFAR-10数据集上的实际效能。