The ResNet architecture has been widely adopted in deep learning due to its significant boost to performance through the use of simple skip connections, yet the underlying mechanisms leading to its success remain largely unknown. In this paper, we conduct a thorough empirical study of the ResNet architecture in classification tasks by linearizing its constituent residual blocks using Residual Jacobians and measuring their singular value decompositions. Our measurements reveal a process called Residual Alignment (RA) characterized by four properties: (RA1) intermediate representations of a given input are equispaced on a line, embedded in high dimensional space, as observed by Gai and Zhang [2021]; (RA2) top left and right singular vectors of Residual Jacobians align with each other and across different depths; (RA3) Residual Jacobians are at most rank C for fully-connected ResNets, where C is the number of classes; and (RA4) top singular values of Residual Jacobians scale inversely with depth. RA consistently occurs in models that generalize well, in both fully-connected and convolutional architectures, across various depths and widths, for varying numbers of classes, on all tested benchmark datasets, but ceases to occur once the skip connections are removed. It also provably occurs in a novel mathematical model we propose. This phenomenon reveals a strong alignment between residual branches of a ResNet (RA2+4), imparting a highly rigid geometric structure to the intermediate representations as they progress linearly through the network (RA1) up to the final layer, where they undergo Neural Collapse.
翻译:ResNet架构因其通过简单的跳跃连接显著提升性能而被广泛应用于深度学习,但其成功背后的潜在机制仍不甚明了。本文通过使用残差雅可比矩阵对分类任务中的残差块进行线性化处理,并测量其奇异值分解,从而对ResNet架构进行了深入的实证研究。我们的测量揭示了一个名为残差对齐(RA)的过程,其特征包括四个性质:(RA1)给定输入的中间表示在嵌入高维空间时等距排列在一条直线上(Gai and Zhang, 2021);(RA2)残差雅可比矩阵的顶部左、右奇异向量在深度方向上相互对齐;(RA3)全连接ResNet中残差雅可比矩阵的秩至多为C,其中C为类别数;(RA4)残差雅可比矩阵的顶部奇异值与深度成反比。RA持续出现在泛化良好的模型中,涵盖全连接和卷积架构,不同深度和宽度、不同类别数,以及所有测试的标准基准数据集,但一旦移除跳跃连接,该现象即消失。此外,我们提出了一种新的数学模型,在此模型中RA可被证明必然发生。这一现象揭示了ResNet中残差分支间的强对齐(RA2+4),赋予中间表示高度刚性的几何结构,使其作为线性路径穿过网络直至最终层(RA1),并在最终层经历神经坍缩。