Despite the impressive performance achieved by data-fusion networks with duplex encoders for visual semantic segmentation, they become ineffective when spatial geometric data are not available. Implicitly infusing the spatial geometric prior knowledge acquired by a duplex-encoder teacher model into a single-encoder student model is a practical, albeit less explored research avenue. This paper delves into this topic and resorts to knowledge distillation approaches to address this problem. We introduce the Learning to Infuse "X" (LIX) framework, with novel contributions in both logit distillation and feature distillation aspects. We present a mathematical proof that underscores the limitation of using a single fixed weight in decoupled knowledge distillation and introduce a logit-wise dynamic weight controller as a solution to this issue. Furthermore, we develop an adaptively-recalibrated feature distillation algorithm, including two technical novelties: feature recalibration via kernel regression and in-depth feature consistency quantification via centered kernel alignment. Extensive experiments conducted with intermediate-fusion and late-fusion networks across various public datasets provide both quantitative and qualitative evaluations, demonstrating the superior performance of our LIX framework when compared to other state-of-the-art approaches.
翻译:尽管采用双编码器的数据融合网络在视觉语义分割任务中取得了显著性能,但当空间几何数据不可用时,此类方法会失效。将双编码器教师模型习得的空间几何先验知识隐式注入单编码器学生模型,是一条实用但研究尚不充分的路径。本文深入探讨该课题,并借助知识蒸馏方法解决这一问题。我们提出了"学习注入X"(LIX)框架,在逻辑蒸馏和特征蒸馏两方面均实现创新。通过数学推导揭示解耦知识蒸馏中单一固定权重的局限性,并提出基于逻辑的动态权重控制器作为解决方案。此外,我们开发了自适应重校准特征蒸馏算法,包含两项技术突破:基于核回归的特征重校准技术,以及基于中心核对齐的深度特征一致性量化方法。在多个公开数据集上采用中融合与晚融合网络开展的大量实验,通过定量与定性评估表明,相较其他前沿方法,本LIX框架展现出卓越性能。