Understanding the mechanism of how convolutional neural networks learn features from image data is a fundamental problem in machine learning and computer vision. In this work, we identify such a mechanism. We posit the Convolutional Neural Feature Ansatz, which states that covariances of filters in any convolutional layer are proportional to the average gradient outer product (AGOP) taken with respect to patches of the input to that layer. We present extensive empirical evidence for our ansatz, including identifying high correlation between covariances of filters and patch-based AGOPs for convolutional layers in standard neural architectures, such as AlexNet, VGG, and ResNets pre-trained on ImageNet. We also provide supporting theoretical evidence. We then demonstrate the generality of our result by using the patch-based AGOP to enable deep feature learning in convolutional kernel machines. We refer to the resulting algorithm as (Deep) ConvRFM and show that our algorithm recovers similar features to deep convolutional networks including the notable emergence of edge detectors. Moreover, we find that Deep ConvRFM overcomes previously identified limitations of convolutional kernels, such as their inability to adapt to local signals in images and, as a result, leads to sizable performance improvement over fixed convolutional kernels.
翻译:理解卷积神经网络如何从图像数据中学习特征,是机器学习和计算机视觉领域的基础性问题。本研究揭示了这一机制。我们提出"卷积神经特征假说":任意卷积层中滤波器的协方差,正比于该层输入图像块上的平均梯度外积(AGOP)。通过大量实证验证该假说,包括在AlexNet、VGG和ImageNet预训练ResNet等标准神经架构中,发现卷积层滤波器协方差与基于图像块的AGOP之间存在高度相关性。同时提供理论支撑。进一步,我们通过基于图像块的AGOP实现卷积核机器中的深度特征学习,论证了结果的普适性。将所得算法称为(深度)ConvRFM,该算法能恢复深度卷积网络的类似特征,包括边缘检测算子的显著涌现。此外,我们发现Deep ConvRFM克服了卷积核先前识别的局限性(如无法适应图像局部信号),从而相较固定卷积核取得显著的性能提升。