Our theoretical understanding of the inner workings of general convolutional neural networks (CNN) is limited. We here present a new stepping stone towards such understanding in the form of a theory of learning in linear CNNs. By analyzing the gradient descent equations, we discover that using convolutions leads to a mismatch between the dataset structure and the network structure. We show that linear CNNs discover the statistical structure of the dataset with non-linear, stage-like transitions, and that the speed of discovery changes depending on this structural mismatch. Moreover, we find that the mismatch lies at the heart of what we call the 'dominant frequency bias', where linear CNNs arrive at these discoveries using only the dominant frequencies of the different structural parts present in the dataset. Our findings can help explain several characteristics of general CNNs, such as their shortcut learning and their tendency to rely on texture instead of shape.
翻译:我们对通用卷积神经网络内部工作机制的理论理解仍十分有限。本文通过构建线性卷积神经网络的学习理论,为这一理解提供了新的里程碑。通过分析梯度下降方程,我们发现卷积运算会导致数据集结构与网络结构之间的失配。研究表明,线性卷积神经网络以非线性的阶段性转变方式发现数据集的统计结构,且发现速度随这种结构性失配程度而变化。更重要的是,我们发现这种失配正是所谓"主导频率偏差"的核心机制——线性卷积神经网络仅利用数据集中不同结构组成部分的最主导频率即可完成这些发现。我们的研究成果有助于解释通用卷积神经网络的若干特性,包括捷径学习现象及其倾向于依赖纹理而非形状的特征。