Over the past decade, deep learning has proven to be a highly effective tool for learning meaningful features from raw data. However, it remains an open question how deep networks perform hierarchical feature learning across layers. In this work, we attempt to unveil this mystery by investigating the structures of intermediate features. Motivated by our empirical findings that linear layers mimic the roles of deep layers in nonlinear networks for feature learning, we explore how deep linear networks transform input data into output by investigating the output (i.e., features) of each layer after training in the context of multi-class classification problems. Toward this goal, we first define metrics to measure within-class compression and between-class discrimination of intermediate features, respectively. Through theoretical analysis of these two metrics, we show that the evolution of features follows a simple and quantitative pattern from shallow to deep layers when the input data is nearly orthogonal and the network weights are minimum-norm, balanced, and approximate low-rank: Each layer of the linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate with respect to the number of layers that data have passed through. To the best of our knowledge, this is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks. Empirically, our extensive experiments not only validate our theoretical results numerically but also reveal a similar pattern in deep nonlinear networks which aligns well with recent empirical studies. Moreover, we demonstrate the practical implications of our results in transfer learning. Our code is available at \url{https://github.com/Heimine/PNC_DLN}.
翻译:过去十年间,深度学习已被证明是从原始数据中学习有意义特征的高效工具。然而,深度网络如何跨层实现层次化特征学习仍是一个未解之谜。本研究通过探究中间层的特征结构试图揭示这一奥秘。受线性层在非线性网络特征学习中模拟深层作用的实证发现启发,我们通过分析训练后各层输出(即特征)来探究深度线性网络如何将输入数据转化为输出,研究背景设定为多分类问题。为此,我们首先定义了分别度量中间特征类内压缩程度与类间判别能力的指标。通过对这两个指标的理论分析,我们发现当输入数据近似正交且网络权重满足最小范数、均衡且近似低秩条件时,特征演化遵循从浅层到深层的简洁定量规律:线性网络的每一层以几何速率逐步压缩类内特征,同时以线性速率(相对于数据经过的层数)区分类间特征。据我们所知,这是首个对深度线性网络层次化表示中特征演化的定量刻画。实验方面,大量数值实验不仅验证了理论结果,还揭示了深度非线性网络中与近期实证研究高度吻合的类似规律。此外,我们展示了该结果在迁移学习中的实践意义。相关代码已开源在\url{https://github.com/Heimine/PNC_DLN}。