The low-dimensional manifold hypothesis posits that the data found in many applications, such as those involving natural images, lie (approximately) on low-dimensional manifolds embedded in a high-dimensional Euclidean space. In this setting, a typical neural network defines a function that takes a finite number of vectors in the embedding space as input. However, one often needs to consider evaluating the optimized network at points outside the training distribution. This paper considers the case in which the training data is distributed in a linear subspace of $\mathbb R^d$. We derive estimates on the variation of the learning function, defined by a neural network, in the direction transversal to the subspace. We study the potential regularization effects associated with the network's depth and noise in the codimension of the data manifold. We also present additional side effects in training due to the presence of noise.
翻译:低维流形假设认为,许多应用(例如涉及自然图像的应用)中的数据(近似)位于嵌入高维欧几里得空间的低维流形上。在此设定下,典型神经网络定义了一个函数,该函数将嵌入空间中的有限向量集合作为输入。然而,人们常常需要在训练分布之外的点上评估优化后的网络。本文考虑训练数据分布在 $\mathbb R^d$ 的线性子空间中的情形。我们推导了神经网络定义的学习函数在子空间横向方向上的变化估计。我们研究了网络深度和数据流形余维中噪声可能带来的正则化效应。我们还展示了由于噪声存在而导致的训练中的额外副作用。