Understanding Deep Representation Learning via Layerwise Feature Compression and Discrimination

Over the past decade, deep learning has proven to be a highly effective tool for learning meaningful features from raw data. However, it remains an open question how deep networks perform hierarchical feature learning across layers. In this work, we attempt to unveil this mystery by investigating the structures of intermediate features. Motivated by our empirical findings that linear layers mimic the roles of deep layers in nonlinear networks for feature learning, we explore how deep linear networks transform input data into output by investigating the output (i.e., features) of each layer after training in the context of multi-class classification problems. Toward this goal, we first define metrics to measure within-class compression and between-class discrimination of intermediate features, respectively. Through theoretical analysis of these two metrics, we show that the evolution of features follows a simple and quantitative pattern from shallow to deep layers when the input data is nearly orthogonal and the network weights are minimum-norm, balanced, and approximate low-rank: Each layer of the linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate with respect to the number of layers that data have passed through. To the best of our knowledge, this is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks. Empirically, our extensive experiments not only validate our theoretical results numerically but also reveal a similar pattern in deep nonlinear networks which aligns well with recent empirical studies. Moreover, we demonstrate the practical implications of our results in transfer learning. Our code is available at \url{https://github.com/Heimine/PNC_DLN}.

翻译：过去十年间，深度学习已被证明是从原始数据中学习有意义特征的高效工具。然而，深度网络如何在各层之间实现层次化特征学习仍是一个未解之谜。本文通过探究中间层特征的结构尝试揭示这一奥秘。受线性层在非线性网络特征学习中模拟深层角色这一实证发现的启发，我们通过研究多分类问题场景下训练后各层输出（即特征）来探究深度线性网络如何将输入数据转换为输出。为此，我们首先定义了分别衡量中间层特征的类内压缩性与类间判别性的指标。通过对这两个指标的理论分析表明：当输入数据近似正交且网络权重满足最小范数、平衡及近似低秩条件时，特征从浅层到深层的演化遵循简单量化规律——线性网络的每一层都会以几何速率逐步压缩类内特征，并以线性速率（相对于数据经过的层数）强化类间判别性。据我们所知，这是首次对深度线性网络层次化表征中特征演化进行的定量刻画。在实证方面，我们开展的大量实验不仅数值验证了理论结果，还在深度非线性网络中揭示了与近期实证研究高度契合的相似演化规律。此外，我们展示了该结果在迁移学习中的实际应用价值。代码已在\url{https://github.com/Heimine/PNC_DLN}开源。