In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We illustrate the effectiveness of the algorithm in non-pathological situations with extensive numerical experiments and link the insights back to the theoretical results.
翻译:实践中,深度神经网络往往能够轻松插值其训练数据。为理解这一现象,大量研究致力于量化神经网络架构的记忆容量:即网络架构能够插值任意标注下任意数据点配置的最大数据点数量。然而对于现实数据,人们直观预期存在良性结构,使得网络尺寸远小于记忆容量所暗示时即可实现插值。本文从实例特定视角研究插值问题。我们提出一种简单随机算法,对于任意固定两类有限数据集,该算法能以高概率在多项式时间内构建插值型三层神经网络。所需参数数量与两个类别的几何特性及其相互排列相关。由此获得的保证与样本数量无关,从而超越了最坏情况下的记忆容量界限。我们通过大量数值实验展示了该算法在非病态场景中的有效性,并将相关见解回归理论结果。