The UNet model consists of fully convolutional network (FCN) layers arranged as contracting encoder and upsampling decoder maps. Nested arrangements of these encoder and decoder maps give rise to extensions of the UNet model, such as UNete and UNet++. Other refinements include constraining the outputs of the convolutional layers to discriminate between segment labels when trained end to end, a property called deep supervision. This reduces feature diversity in these nested UNet models despite their large parameter space. Furthermore, for texture segmentation, pixel correlations at multiple scales contribute to the classification task; hence, explicit deep supervision of shallower layers is likely to enhance performance. In this paper, we propose ADS UNet, a stage-wise additive training algorithm that incorporates resource-efficient deep supervision in shallower layers and takes performance-weighted combinations of the sub-UNets to create the segmentation model. We provide empirical evidence on three histopathology datasets to support the claim that the proposed ADS UNet reduces correlations between constituent features and improves performance while being more resource efficient. We demonstrate that ADS_UNet outperforms state-of-the-art Transformer-based models by 1.08 and 0.6 points on CRAG and BCSS datasets, and yet requires only 37% of GPU consumption and 34% of training time as that required by Transformers.
翻译:UNet模型由全卷积网络层构成,其排列方式为收缩编码器与上采样解码器映射。编码器与解码器映射的嵌套排列催生了UNet模型的扩展版本,如UNet++和UNet3+。其他改进包括在端到端训练时约束卷积层输出以区分分割标签,这一特性称为深度监督。尽管这些嵌套UNet模型具有庞大的参数空间,但深度监督会降低其特征多样性。此外,对于纹理分割而言,多尺度下的像素相关性有助于分类任务;因此,对较浅层施加显式深度监督可能提升性能。本文提出ADS_UNet,一种逐阶段加性训练算法,该算法在较浅层中融入资源高效的深度监督,并通过子UNet的性能加权组合构建分割模型。我们在三个组织病理学数据集上提供了实证证据,证明所提出的ADS_UNet能够降低组成特征间的相关性、提升性能,同时资源效率更高。我们证明,在CRAG和BCSS数据集上,ADS_UNet分别以高于基于Transformer的最先进模型1.08和0.6个百分点的性能胜出,而其所耗GPU资源仅为Transformer模型的37%,训练时间仅为34%。