In medical image segmentation, particularly in UNet-like architectures, upsampling is primarily used to transform smaller feature maps into larger ones, enabling feature fusion between encoder and decoder features and supporting multi-scale prediction. Conventional upsampling methods, such as transposed convolution and linear interpolation, operate on fixed positions: transposed convolution applies kernel elements to predetermined pixel or voxel locations, while linear interpolation assigns values based on fixed coordinates in the original feature map. These fixed-position approaches may fail to capture structural information beyond predefined sampling positions and can lead to artifacts or loss of detail. Inspired by deformable convolutions, we propose a novel upsampling method, Deformable Transposed Convolution (DTC), which learns dynamic coordinates (i.e., sampling positions) to generate high-resolution feature maps for both 2D and 3D medical image segmentation tasks. Experiments on 3D (e.g., BTCV15) and 2D datasets (e.g., ISIC18, BUSI) demonstrate that DTC can be effectively integrated into existing medical image segmentation models, consistently improving the decoder's feature reconstruction and detail recovery capability.
翻译:在医学图像分割中,尤其是在UNet类架构中,上采样主要用于将较小的特征图转换为较大的特征图,从而实现编码器与解码器特征之间的融合,并支持多尺度预测。传统的上采样方法,如转置卷积和线性插值,均在固定位置上操作:转置卷积将卷积核元素应用于预定的像素或体素位置,而线性插值则基于原始特征图中的固定坐标分配值。这些固定位置的方法可能无法捕捉预定义采样位置之外的结构信息,并可能导致伪影或细节丢失。受可变形卷积的启发,我们提出了一种新颖的上采样方法——可变形转置卷积(DTC),该方法通过学习动态坐标(即采样位置)来生成高分辨率特征图,适用于2D和3D医学图像分割任务。在3D(例如BTCV15)和2D数据集(例如ISIC18、BUSI)上的实验表明,DTC可以有效地集成到现有的医学图像分割模型中,持续提升解码器的特征重建与细节恢复能力。