We propose a novel antialiasing method to increase shift invariance and prediction accuracy in convolutional neural networks. Specifically, we replace the first-layer combination "real-valued convolutions + max pooling" ($\mathbb{R}$Max) by "complex-valued convolutions + modulus" ($\mathbb{C}$Mod), which is stable to translations. To justify our approach, we claim that $\mathbb{C}$Mod and $\mathbb{R}$Max produce comparable outputs when the convolution kernel is band-pass and oriented (Gabor-like filter). In this context, $\mathbb{C}$Mod can be considered as a stable alternative to $\mathbb{R}$Max. Thus, prior to antialiasing, we force the convolution kernels to adopt such a Gabor-like structure. The corresponding architecture is called mathematical twin, because it employs a well-defined mathematical operator to mimic the behavior of the original, freely-trained model. Our antialiasing approach achieves superior accuracy on ImageNet and CIFAR-10 classification tasks, compared to prior methods based on low-pass filtering. Arguably, our approach's emphasis on retaining high-frequency details contributes to a better balance between shift invariance and information preservation, resulting in improved performance. Furthermore, it has a lower computational cost and memory footprint than concurrent work, making it a promising solution for practical implementation.
翻译:我们提出了一种新颖的抗混叠方法,用于提升卷积神经网络的平移不变性和预测精度。具体而言,我们将第一层中的“实值卷积+最大池化”($\mathbb{R}$Max)组合替换为“复值卷积+模值”($\mathbb{C}$Mod),后者对平移具有稳定性。为论证这一方法,我们指出当卷积核为带通且定向(类似Gabor滤波器)时,$\mathbb{C}$Mod与$\mathbb{R}$Max产生的输出可相互比拟。在此情况下,$\mathbb{C}$Mod可视为$\mathbb{R}$Max的稳定替代方案。因此,在抗混叠处理前,我们强制卷积核采用此类Gabor-like结构。该架构称为数学孪生模型,因其利用定义明确的数学算子模仿原始自由训练模型的行为。与基于低通滤波的现有方法相比,我们的抗混叠方法在ImageNet和CIFAR-10分类任务中取得了更优的准确性。可论证的是,该方法强调保留高频细节的特性有助于更好地平衡平移不变性与信息保存,从而提升性能。此外,相较于同期工作,该方法具有更低的计算开销和内存占用,使其成为实际部署中颇具前景的解决方案。