Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient Semantic Segmentation

In recent years, knowledge distillation methods based on contrastive learning have achieved promising results on image classification and object detection tasks. However, in this line of research, we note that less attention is paid to semantic segmentation. Existing methods heavily rely on data augmentation and memory buffer, which entail high computational resource demands when applying them to handle semantic segmentation that requires to preserve high-resolution feature maps for making dense pixel-wise predictions. In order to address this problem, we present Augmentation-free Dense Contrastive Knowledge Distillation (Af-DCD), a new contrastive distillation learning paradigm to train compact and accurate deep neural networks for semantic segmentation applications. Af-DCD leverages a masked feature mimicking strategy, and formulates a novel contrastive learning loss via taking advantage of tactful feature partitions across both channel and spatial dimensions, allowing to effectively transfer dense and structured local knowledge learnt by the teacher model to a target student model while maintaining training efficiency. Extensive experiments on five mainstream benchmarks with various teacher-student network pairs demonstrate the effectiveness of our approach. For instance, the DeepLabV3-Res18|DeepLabV3-MBV2 model trained by Af-DCD reaches 77.03%|76.38% mIOU on Cityscapes dataset when choosing DeepLabV3-Res101 as the teacher, setting new performance records. Besides that, Af-DCD achieves an absolute mIOU improvement of 3.26%|3.04%|2.75%|2.30%|1.42% compared with individually trained counterpart on Cityscapes|Pascal VOC|Camvid|ADE20K|COCO-Stuff-164K. Code is available at https://github.com/OSVAI/Af-DCD

翻译：近年来，基于对比学习的知识蒸馏方法在图像分类和目标检测任务中取得了显著成果。然而，在这一研究方向中，我们注意到语义分割领域受到的关注较少。现有方法严重依赖数据增强和内存缓冲区，这在处理需要保留高分辨率特征图以进行密集像素预测的语义分割任务时，会带来高计算资源需求。为解决这一问题，我们提出无增强稠密对比知识蒸馏（Af-DCD），这是一种新的对比蒸馏学习范式，用于训练紧凑且准确的深度神经网络以应用于语义分割。Af-DCD采用掩码特征模仿策略，并利用在通道和空间维度上的巧妙特征划分，构建了一种新颖的对比学习损失，从而能够在保持训练效率的同时，有效将教师模型学习的密集结构化局部知识迁移到目标学生模型。在五个主流基准数据集上使用多种师生网络对进行的大量实验证明了我们方法的有效性。例如，当选择DeepLabV3-Res101作为教师时，通过Af-DCD训练的DeepLabV3-Res18|DeepLabV3-MBV2模型在Cityscapes数据集上达到77.03%|76.38%的mIOU，刷新了性能记录。此外，与独立训练的对应模型相比，Af-DCD在Cityscapes|Pascal VOC|Camvid|ADE20K|COCO-Stuff-164K上分别实现了3.26%|3.04%|2.75%|2.30%|1.42%的绝对mIOU提升。代码已开源：https://github.com/OSVAI/Af-DCD