This paper focuses on improving the mathematical interpretability of convolutional neural networks (CNNs) in the context of image classification. Specifically, we tackle the instability issue arising in their first layer, which tends to learn parameters that closely resemble oriented band-pass filters when trained on datasets like ImageNet. Subsampled convolutions with such Gabor-like filters are prone to aliasing, causing sensitivity to small input shifts. In this context, we establish conditions under which the max pooling operator approximates a complex modulus, which is nearly shift invariant. We then derive a measure of shift invariance for subsampled convolutions followed by max pooling. In particular, we highlight the crucial role played by the filter's frequency and orientation in achieving stability. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree complex wavelet packet transform, a particular case of discrete Gabor-like decomposition.
翻译:本文旨在提升卷积神经网络在图像分类任务中的数学可解释性。具体而言,我们针对其第一层中存在的不稳定性问题展开研究——当网络在ImageNet等数据集上训练时,该层倾向于学习与定向带通滤波器高度相似的参数。采用此类类Gabor滤波器进行下采样卷积时,易产生混叠现象,从而导致网络对微小输入平移敏感。在此背景下,我们确立了最大池化算子近似于复模量(具有近似平移不变性)的条件。进而推导出下采样卷积后接最大池化的平移不变性度量。特别地,我们揭示了滤波器的频率与方向在实现稳定性中的关键作用。通过基于双树复小波包变换(一种离散类Gabor分解的特例)的确定性特征提取器,我们实验验证了上述理论。