The translational equivariant nature of Convolutional Neural Networks (CNNs) is a reason for its great success in computer vision. However, networks do not enjoy more general equivariance properties such as rotation or scaling, ultimately limiting their generalization performance. To address this limitation, we devise a method that endows CNNs with simultaneous equivariance with respect to translation, rotation, and scaling. Our approach defines a convolution-like operation and ensures equivariance based on our proposed scalable Fourier-Argand representation. The method maintains similar efficiency as a traditional network and hardly introduces any additional learnable parameters, since it does not face the computational issue that often occurs in group-convolution operators. We validate the efficacy of our approach in the image classification task, demonstrating its robustness and the generalization ability to both scaled and rotated inputs.
翻译:卷积神经网络(CNN)的平移等变特性是其成功应用于计算机视觉领域的重要原因。然而,现有网络并不具备更广泛的等变性(如旋转或缩放),这从根本上限制了其泛化能力。针对这一局限,我们提出了一种方法,使CNN同时具备平移、旋转和缩放等变性。该方法定义了一种类卷积运算,并基于我们提出的可扩展傅里叶-阿甘表示(Scalable Fourier-Argand representation)确保等变性。由于本方法避免了群卷积算子中常见的计算问题,其效率与传统网络相当,且几乎不引入额外的可学习参数。我们在图像分类任务上验证了该方法的有效性,证明了其对缩放和旋转输入均具有良好的鲁棒性和泛化能力。