We study data-free knowledge distillation (KD) for monocular depth estimation (MDE), which learns a lightweight model for real-world depth perception tasks by compressing it from a trained teacher model while lacking training data in the target domain. Owing to the essential difference between image classification and dense regression, previous methods of data-free KD are not applicable to MDE. To strengthen its applicability in real-world tasks, in this paper, we propose to apply KD with out-of-distribution simulated images. The major challenges to be resolved are i) lacking prior information about scene configurations of real-world training data and ii) domain shift between simulated and real-world images. To cope with these difficulties, we propose a tailored framework for depth distillation. The framework generates new training samples for embracing a multitude of possible object arrangements in the target domain and utilizes a transformation network to efficiently adapt them to the feature statistics preserved in the teacher model. Through extensive experiments on various depth estimation models and two different datasets, we show that our method outperforms the baseline KD by a good margin and even achieves slightly better performance with as few as 1/6 of training images, demonstrating a clear superiority.
翻译:我们研究了单目深度估计(MDE)的无数据知识蒸馏(KD)方法,该方法通过从训练好的教师模型中压缩轻量级模型,在缺乏目标域训练数据的情况下学习用于真实世界深度感知任务的模型。由于图像分类与密集回归之间存在本质差异,以往的无数据KD方法不适用于MDE。为了增强其在真实世界任务中的适用性,本文提出利用分布外模拟图像进行KD。需要解决的主要挑战包括:i) 缺乏关于真实世界训练数据场景配置的先验信息,以及ii) 模拟图像与真实世界图像之间的域偏移。为应对这些困难,我们提出了一个针对深度蒸馏的定制化框架。该框架生成新的训练样本以涵盖目标域中多种可能的物体排列,并利用变换网络高效地将这些样本适应于教师模型中保留的特征统计量。通过对多种深度估计模型和两个不同数据集的大量实验,我们证明该方法显著优于基线KD方法,甚至在仅使用1/6训练图像时仍能达到略优的性能,展现出明显的优越性。