Recent foundation models demonstrate strong generalization capabilities in monocular depth estimation. However, directly applying these models to Full Surround Monocular Depth Estimation (FSMDE) presents two major challenges: (1) high computational cost, which limits real-time performance, and (2) difficulty in estimating metric-scale depth, as these models are typically trained to predict only relative depth. To address these limitations, we propose a novel knowledge distillation strategy that transfers robust depth knowledge from a foundation model to a lightweight FSMDE network. Our approach leverages a hybrid regression framework combining the knowledge distillation scheme--traditionally used in classification--with a depth binning module to enhance scale consistency. Specifically, we introduce a cross-interaction knowledge distillation scheme that distills the scale-invariant depth bin probabilities of a foundation model into the student network while guiding it to infer metric-scale depth bin centers from ground-truth depth. Furthermore, we propose view-relational knowledge distillation, which encodes structural relationships among adjacent camera views and transfers them to enhance cross-view depth consistency. Experiments on DDAD and nuScenes demonstrate the effectiveness of our method compared to conventional supervised methods and existing knowledge distillation approaches. Moreover, our method achieves a favorable trade-off between performance and efficiency, meeting real-time requirements.


翻译:近期的基础模型在单目深度估计任务中展现出强大的泛化能力。然而,将这些模型直接应用于全周单目深度估计(FSMDE)面临两大挑战:(1)高昂的计算成本限制了实时性能;(2)难以估计度量尺度深度,因为这些模型通常仅训练用于预测相对深度。为应对这些局限,我们提出一种新颖的知识蒸馏策略,将基础模型中鲁棒的深度知识迁移至轻量级FSMDE网络。该方法采用混合回归框架,将传统用于分类任务的知识蒸馏方案与深度分箱模块相结合,以增强尺度一致性。具体而言,我们引入跨交互知识蒸馏方案,将基础模型的尺度不变深度分箱概率蒸馏至学生网络,同时指导学生网络从真实深度中推断度量尺度的深度分箱中心。此外,我们提出视角关系知识蒸馏,该方法编码相邻相机视角间的结构关系并将其迁移,以增强跨视角深度一致性。在DDAD和nuScenes数据集上的实验表明,相较于传统监督方法和现有知识蒸馏方法,本方法具有显著优势。同时,我们的方法在性能与效率间取得了良好平衡,满足实时性要求。

0
下载
关闭预览

相关内容

通过学习、实践或探索所获得的认识、判断或技能。
Top
微信扫码咨询专知VIP会员