Current knowledge distillation approaches in semantic segmentation tend to adopt a holistic approach that treats all spatial locations equally. However, for dense prediction, students' predictions on edge regions are highly uncertain due to contextual information leakage, requiring higher spatial sensitivity knowledge than the body regions. To address this challenge, this paper proposes a novel approach called boundary-privileged knowledge distillation (BPKD). BPKD distills the knowledge of the teacher model's body and edges separately to the compact student model. Specifically, we employ two distinct loss functions: (i) edge loss, which aims to distinguish between ambiguous classes at the pixel level in edge regions; (ii) body loss, which utilizes shape constraints and selectively attends to the inner-semantic regions. Our experiments demonstrate that the proposed BPKD method provides extensive refinements and aggregation for edge and body regions. Additionally, the method achieves state-of-the-art distillation performance for semantic segmentation on three popular benchmark datasets, highlighting its effectiveness and generalization ability. BPKD shows consistent improvements across a diverse array of lightweight segmentation structures, including both CNNs and transformers, underscoring its architecture-agnostic adaptability. The code is available at \url{https://github.com/AkideLiu/BPKD}.
翻译:当前的语义分割知识蒸馏方法通常采用整体策略,平等对待所有空间位置。然而,对于密集预测任务而言,由于上下文信息泄露,学生模型在边缘区域的预测具有高度不确定性,需要比主体区域更高的空间敏感度知识。为解决这一挑战,本文提出了一种名为边界特权知识蒸馏(BPKD)的新方法。BPKD将教师模型的边缘与主体知识分别蒸馏至紧凑的学生模型中。具体而言,我们采用两种不同的损失函数:(i)边缘损失,旨在像素级别区分边缘区域的模糊类别;(ii)主体损失,利用形状约束并选择性关注内部语义区域。实验表明,所提出的BPKD方法对边缘和主体区域实现了广泛优化与聚合。此外,该方法在三个主流基准数据集上取得了语义分割蒸馏性能的最先进成果,凸显了其有效性与泛化能力。BPKD在包括CNN和Transformer在内的多种轻量级分割架构上均展现出持续改进,验证了其架构无关的适应性。代码已开源:\url{https://github.com/AkideLiu/BPKD}。