面向全周单目深度估计的尺度不变与视角关系表示学习方法 (Scale-invariant and View-relational Representation Learning for Full Surround Monocular Depth)

Recent foundation models demonstrate strong generalization capabilities in monocular depth estimation. However, directly applying these models to Full Surround Monocular Depth Estimation (FSMDE) presents two major challenges: (1) high computational cost, which limits real-time performance, and (2) difficulty in estimating metric-scale depth, as these models are typically trained to predict only relative depth. To address these limitations, we propose a novel knowledge distillation strategy that transfers robust depth knowledge from a foundation model to a lightweight FSMDE network. Our approach leverages a hybrid regression framework combining the knowledge distillation scheme--traditionally used in classification--with a depth binning module to enhance scale consistency. Specifically, we introduce a cross-interaction knowledge distillation scheme that distills the scale-invariant depth bin probabilities of a foundation model into the student network while guiding it to infer metric-scale depth bin centers from ground-truth depth. Furthermore, we propose view-relational knowledge distillation, which encodes structural relationships among adjacent camera views and transfers them to enhance cross-view depth consistency. Experiments on DDAD and nuScenes demonstrate the effectiveness of our method compared to conventional supervised methods and existing knowledge distillation approaches. Moreover, our method achieves a favorable trade-off between performance and efficiency, meeting real-time requirements.

翻译：近期的基础模型在单目深度估计任务中展现出强大的泛化能力。然而，将这些模型直接应用于全周单目深度估计（FSMDE）面临两大挑战：（1）高昂的计算成本限制了实时性能；（2）难以估计度量尺度深度，因为这些模型通常仅训练用于预测相对深度。为应对这些局限，我们提出一种新颖的知识蒸馏策略，将基础模型中鲁棒的深度知识迁移至轻量级FSMDE网络。该方法采用混合回归框架，将传统用于分类任务的知识蒸馏方案与深度分箱模块相结合，以增强尺度一致性。具体而言，我们引入跨交互知识蒸馏方案，将基础模型的尺度不变深度分箱概率蒸馏至学生网络，同时指导学生网络从真实深度中推断度量尺度的深度分箱中心。此外，我们提出视角关系知识蒸馏，该方法编码相邻相机视角间的结构关系并将其迁移，以增强跨视角深度一致性。在DDAD和nuScenes数据集上的实验表明，相较于传统监督方法和现有知识蒸馏方法，本方法具有显著优势。同时，我们的方法在性能与效率间取得了良好平衡，满足实时性要求。

相关内容