Due to limitations in data quality, some essential visual tasks are difficult to perform independently. Introducing previously unavailable information to transfer informative dark knowledge has been a common way to solve such hard tasks. However, research on why transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks. On this basis, we express DS using deep channel-wise correlation and intermediate spatial distribution, and propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance. The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy. This is especially useful in specific applications where the performance gap between dual modalities is relatively large. Furthermore, we collect a real-world CML dataset to promote community development. The dataset contains more than 10,000 paired optical and radar images and is continuously being updated. Experimental results on real-world and benchmark datasets validate the effectiveness of the proposed method.
翻译:由于数据质量的限制,某些关键视觉任务难以独立完成。引入先前不可用的信息以传递富有内涵的暗知识已成为解决此类难题的常用方法。然而,关于迁移知识为何有效的研究尚未得到广泛探索。为解决这一问题,本文通过分析与观测简单任务与困难任务提取的特征,发现特征判别性与维度结构(DS)之间的相关性。在此基础上,我们利用深度通道级相关性与中间层空间分布来表达DS,并提出一种新颖的跨模态知识蒸馏(CMKD)方法,以提升有监督跨模态学习(CML)的性能。该方法强制输出特征在通道上独立分布、中间层特征均匀分布,从而从困难任务中学习语义无关特征以提升其准确率。这在双模态性能差距较大的特定应用中尤为有效。此外,我们采集了一个真实世界的CML数据集以推动社区发展。该数据集包含超过10,000对光学与雷达图像,并持续更新。在真实世界与基准数据集上的实验结果验证了所提方法的有效性。