As a fundamental part of computational healthcare, Computer Tomography (CT) and Magnetic Resonance Imaging (MRI) provide volumetric data, making the development of algorithms for 3D image analysis a necessity. Despite being computationally cheap, 2D Convolutional Neural Networks can only extract spatial information. In contrast, 3D CNNs can extract three-dimensional features, but they have higher computational costs and latency, which is a limitation for clinical practice that requires fast and efficient models. Inspired by the field of video action recognition we propose a new 2D-based model dubbed Slice SHift UNet (SSH-UNet) which encodes three-dimensional features at 2D CNN's complexity. More precisely multi-view features are collaboratively learned by performing 2D convolutions along the three orthogonal planes of a volume and imposing a weights-sharing mechanism. The third dimension, which is neglected by the 2D convolution, is reincorporated by shifting a portion of the feature maps along the slices' axis. The effectiveness of our approach is validated in Multi-Modality Abdominal Multi-Organ Segmentation (AMOS) and Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) datasets, showing that SSH-UNet is more efficient while on par in performance with state-of-the-art architectures.
翻译:作为计算医疗健康的基础组成部分,计算机断层扫描(CT)和磁共振成像(MRI)提供体积数据,使得开发3D图像分析算法成为必然。尽管计算成本低,2D卷积神经网络只能提取空间信息。相比之下,3D CNN能够提取三维特征,但其计算成本和延迟较高,这成为需要快速高效模型的临床实践中的限制。受视频动作识别领域的启发,我们提出一种新的基于2D的模型,名为Slice Shift UNet(SSH-UNet),它以2D CNN的复杂度编码三维特征。更精确地说,通过沿体积的三个正交平面执行2D卷积并施加权重共享机制,多视图特征被协同学习。通过沿切片轴移动部分特征图,重新引入了被2D卷积忽略的第三维度。我们的方法在多模态腹部多器官分割(AMOS)和多图谱标注超越颅穹窿(BTCV)数据集上得到验证,表明SSH-UNet在保持与最先进架构性能相当的同时更为高效。