Deep neural networks (DNN) are singular statistical models which exhibit complex degeneracies. In this work, we illustrate how a quantity known as the \emph{learning coefficient} introduced in singular learning theory quantifies precisely the degree of degeneracy in deep neural networks. Importantly, we will demonstrate that degeneracy in DNN cannot be accounted for by simply counting the number of "flat" directions. We propose a computationally scalable approximation of a localized version of the learning coefficient using stochastic gradient Langevin dynamics. To validate our approach, we demonstrate its accuracy in low-dimensional models with known theoretical values. Importantly, the local learning coefficient can correctly recover the ordering of degeneracy between various parameter regions of interest. An experiment on MNIST shows the local learning coefficient can reveal the inductive bias of stochastic opitmizers for more or less degenerate critical points.
翻译:深度神经网络(DNN)是具有复杂退化性的奇异统计模型。本文阐释了奇异学习理论中提出的“学习系数”这一概念如何精确量化深度神经网络的退化程度。重要的是,我们将证明DNN的退化性不能仅通过计算“平坦”方向的数量来解释。我们提出了一种利用随机梯度朗之万动力学对局部化学习系数进行可计算扩展的近似方法。为验证该方法的有效性,我们在低维模型中验证了其与已知理论值的一致性。关键结论是,局部学习系数能够正确恢复感兴趣参数区域之间的退化性排序。针对MNIST数据集的实验表明,局部学习系数可揭示随机优化器对不同程度退化临界点的归纳偏置。