This paper focuses on improving the mathematical interpretability of convolutional neural networks (CNNs) in the context of image classification. Specifically, we tackle the instability issue arising in their first layer, which tends to learn parameters that closely resemble oriented band-pass filters when trained on datasets like ImageNet. Subsampled convolutions with such Gabor-like filters are prone to aliasing, causing sensitivity to small input shifts. In this context, we establish conditions under which the max pooling operator approximates a complex modulus, which is nearly shift invariant. We then derive a measure of shift invariance for subsampled convolutions followed by max pooling. In particular, we highlight the crucial role played by the filter's frequency and orientation in achieving stability. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree complex wavelet packet transform, a particular case of discrete Gabor-like decomposition.
翻译:本文旨在提升卷积神经网络(CNN)在图像分类任务中的数学可解释性。具体而言,我们针对其第一层存在的不稳定性问题展开研究:当在ImageNet等数据集上训练时,该层倾向于学习与定向带通滤波器高度相似的参数。采用此类类Gabor滤波器的下采样卷积操作容易产生混叠现象,导致网络对微小的输入平移敏感。在此背景下,我们建立了最大池化算子近似于复模量的条件,而该复模量具有近乎平移不变的性质。随后,我们推导出下采样卷积接续最大池化操作的平移不变性度量。特别地,我们强调了滤波器频率与方向对实现稳定性的关键作用。通过基于对偶树复小波包变换(离散类Gabor分解的特例)构建确定性特征提取器,我们在实验中验证了所提出的理论。