We prove the converse of the universal approximation theorem, i.e. a neural network (NN) encoding theorem which shows that for every stably converged NN of continuous activation functions, its weight matrix actually encodes a continuous function that approximates its training dataset to within a finite margin of error over a bounded domain. We further show that using the Eckart-Young theorem for truncated singular value decomposition of the weight matrix for every NN layer, we can illuminate the nature of the latent space manifold of the training dataset encoded and represented by every NN layer, and the geometric nature of the mathematical operations performed by each NN layer. Our results have implications for understanding how NNs break the curse of dimensionality by harnessing memory capacity for expressivity, and that the two are complementary. This Layer Matrix Decomposition (LMD) further suggests a close relationship between eigen-decomposition of NN layers and the latest advances in conceptualizations of Hopfield networks and Transformer NN models.
翻译:我们证明了万能逼近定理的逆定理,即神经网络编码定理:对于任何稳定收敛的连续激活函数神经网络,其权重矩阵实际上编码了一个连续函数,该函数在有界域内以有限误差逼近其训练数据集。我们进一步证明,利用Eckart-Young定理对每个神经网络层的权重矩阵进行截断奇异值分解,可以揭示每个神经网络层所编码和表征的训练数据集潜空间流形的本质,以及每个神经网络层执行的数学运算的几何本质。我们的研究结果对于理解神经网络如何通过利用记忆容量实现表达能力从而突破维度灾难具有重要意义,且两者具有互补性。这种层级矩阵分解(LMD)进一步揭示了神经网络层的特征分解与霍普菲尔德网络及Transformer神经网络模型概念化最新进展之间的密切关系。