The goal of this thesis is to improve our understanding of the internal mechanisms by which deep artificial neural networks create meaningful representations and are able to generalize. We focus on the challenge of characterizing the semantic content of the hidden representations with unsupervised learning tools, partially developed by us and described in this thesis, which allow harnessing the low-dimensional structure of the data. Chapter 2. introduces Gride, a method that allows estimating the intrinsic dimension of the data as an explicit function of the scale without performing any decimation of the data set. Our approach is based on rigorous distributional results that enable the quantification of uncertainty of the estimates. Moreover, our method is simple and computationally efficient since it relies only on the distances among nearest data points. In Chapter 3, we study the evolution of the probability density across the hidden layers in some state-of-the-art deep neural networks. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant to classification. In subsequent layers, density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. This process leaves a footprint in the probability density of the output layer, where the topography of the peaks allows reconstructing the semantic relationships of the categories. In Chapter 4, we study the problem of generalization in deep neural networks: adding parameters to a network that interpolates its training data will typically improve its generalization performance, at odds with the classical bias-variance trade-off. We show that wide neural networks learn redundant representations instead of overfitting to spurious correlation and that redundant neurons appear only if the network is regularized and the training error is zero.
翻译:本论文旨在深化对深度人工神经网络内部机制的理解,探究其如何构建有意义的表征并实现泛化能力。我们聚焦于利用无监督学习工具(部分由本研究开发并在文中详述)来刻画隐藏表征语义内容的挑战,这些方法能够有效利用数据的低维结构。第二章介绍Gride方法,该方法可在无需对数据集进行任何抽样的前提下,估计数据内在维度随尺度变化的显式函数。我们的方法基于严格的分布理论结果,能够量化估计值的不确定性。此外,该方法仅依赖于最近邻数据点间的距离,具有算法简洁与计算高效的特点。第三章研究了若干前沿深度神经网络中概率密度在隐藏层间的演化规律。我们发现初始层通过消除与分类无关的结构信息,生成单峰概率密度;后续层则以层级化方式产生密度峰,这种层级结构与概念的语义层次形成镜像对应。该过程在输出层的概率密度中留下特征印记——密度峰的地形分布能够重构类别间的语义关系。第四章探讨深度神经网络的泛化问题:对于已完全拟合训练数据的网络,增加参数通常反而能提升其泛化性能,这与经典的偏差-方差权衡理论相悖。我们证明宽神经网络会学习冗余表征而非过度拟合虚假相关性,且仅当网络被正则化且训练误差为零时才会出现冗余神经元。