Filter-decomposition-based group-equivariant convolutional neural networks (G-CNN) have been demonstrated to increase CNN's data efficiency and contribute to better interpretability and controllability of CNN models. However, so far filter-decomposition-based affine G-CNN methods rely on parameter sharing for achieving high parameter efficiency and suffer from a heavy computational burden. They also use a limited number of transformations and in particular ignore the shear transform in the application. In this paper, we address these problems by emphasizing the importance of the diversity of transformations. We propose a flexible and efficient strategy based on weighted filter-wise Monte Carlo sampling. In addition, we introduce shear equivariant CNN to address the highly sparse representations of natural images. We demonstrate that the proposed methods are intrinsically an efficient generalization of traditional CNNs, and we explain the advantage of bottleneck architectures used in the existing state-of-the-art CNN models such as ResNet, ResNext, and ConvNeXt from the group-equivariant perspective. Experiments on image classification and image denoising tasks show that with a set of suitable filter basis, our methods achieve superior performance to standard CNN with high data efficiency. The code will be available at https://github.com/ZhaoWenzhao/MCG_CNN.
翻译:基于滤波器分解的群等变卷积神经网络(G-CNN)已被证明能够提高CNN的数据效率,并有助于提升CNN模型的可解释性和可控性。然而,目前基于滤波器分解的仿射G-CNN方法依赖参数共享以实现高参数效率,但面临计算负担沉重的问题。此外,这些方法仅使用有限数量的变换,尤其忽略了剪切变换在应用中的使用。本文通过强调变换多样性的重要性来解决这些问题,提出了一种基于加权逐滤波器蒙特卡洛采样的灵活高效策略。同时,我们引入了剪切等变CNN以应对自然图像的高度稀疏表示。我们证明所提方法本质上是传统CNN的高效泛化,并从群等变视角解释了现有先进CNN模型(如ResNet、ResNext和ConvNeXt)中瓶颈架构的优势。在图像分类和图像去噪任务上的实验表明,通过使用合适的滤波器基,我们的方法在保持高数据效率的同时取得了优于标准CNN的性能。代码将开源在https://github.com/ZhaoWenzhao/MCG_CNN。