The ability of convolutional neural networks (CNNs) to recognize objects regardless of their position in the image is due to the translation-equivariance of the convolutional operation. Group-equivariant CNNs transfer this equivariance to other transformations of the input. Dealing appropriately with objects and object parts of different scale is challenging, and scale can vary for multiple reasons such as the underlying object size or the resolution of the imaging modality. In this paper, we propose a scale-equivariant convolutional network layer for three-dimensional data that guarantees scale-equivariance in 3D CNNs. Scale-equivariance lifts the burden of having to learn each possible scale separately, allowing the neural network to focus on higher-level learning goals, which leads to better results and better data-efficiency. We provide an overview of the theoretical foundations and scientific work on scale-equivariant neural networks in the two-dimensional domain. We then transfer the concepts from 2D to the three-dimensional space and create a scale-equivariant convolutional layer for 3D data. Using the proposed scale-equivariant layer, we create a scale-equivariant U-Net for medical image segmentation and compare it with a non-scale-equivariant baseline method. Our experiments demonstrate the effectiveness of the proposed method in achieving scale-equivariance for 3D medical image analysis. We publish our code at https://github.com/wimmerth/scale-equivariant-3d-convnet for further research and application.
翻译:卷积神经网络(CNN)能够识别图像中任意位置的物体,得益于卷积运算的平移等变性。群等变CNN将这种等变性扩展到输入的其他变换。如何恰当处理不同尺度的物体及物体部件仍具挑战性,而尺度变化可能源于多种因素,如物体自身大小或成像模态的分辨率。本文提出一种面向三维数据的尺度等变卷积网络层,可在3D CNN中确保尺度等变性。尺度等变性免除了网络逐一学习各可能尺度的负担,使其能专注于更高层次的学习目标,从而获得更优结果与更高数据效率。我们综述了二维领域尺度等变神经网络的理论基础与科研成果,继而将这些概念从二维空间拓展至三维空间,构建了面向三维数据的尺度等变卷积层。基于该尺度等变层,我们构建了用于医学图像分割的尺度等变U-Net,并与非尺度等变的基线方法进行对比。实验表明,所提方法在实现三维医学图像分析的尺度等变性方面具有显著效果。为促进后续研究与实际应用,我们已将相关代码开源至 https://github.com/wimmerth/scale-equivariant-3d-convnet。