We here present a stepping stone towards a deeper understanding of convolutional neural networks (CNNs) in the form of a theory of learning in linear CNNs. Through analyzing the gradient descent equations, we discover that the evolution of the network during training is determined by the interplay between the dataset structure and the convolutional network structure. We show that linear CNNs discover the statistical structure of the dataset with non-linear, ordered, stage-like transitions, and that the speed of discovery changes depending on the relationship between the dataset and the convolutional network structure. Moreover, we find that this interplay lies at the heart of what we call the ``dominant frequency bias'', where linear CNNs arrive at these discoveries using only the dominant frequencies of the different structural parts present in the dataset. We furthermore provide experiments that show how our theory relates to deep, non-linear CNNs used in practice. Our findings shed new light on the inner working of CNNs, and can help explain their shortcut learning and their tendency to rely on texture instead of shape.
翻译:我们在此提出一个迈向深入理解卷积神经网络(CNN)的理论基石,以线性CNN的学习理论形式呈现。通过分析梯度下降方程,我们发现网络在训练过程中的演化由数据集结构与卷积网络结构之间的相互作用决定。研究表明,线性CNN通过非线性、有序且分阶段的方式发现数据集的统计结构,且发现速度随数据集与卷积网络结构之间的关系而变化。此外,我们发现这种相互作用正是所谓“主频偏差”的核心机制——即线性CNN仅利用数据集中不同结构成分的主要频率即可实现上述发现。我们进一步通过实验证明该理论与实际应用中的深层非线性CNN之间的关联。这些发现为理解CNN的内部工作机制提供了新视角,并有助于解释其捷径学习行为以及依赖纹理而非形状的倾向。