Self-supervised monocular depth estimation (SSMDE) has gained attention in the field of deep learning as it estimates depth without requiring ground truth depth maps. This approach typically uses a photometric consistency loss between a synthesized image, generated from the estimated depth, and the original image, thereby reducing the need for extensive dataset acquisition. However, the conventional photometric consistency loss relies on the Lambertian assumption, which often leads to significant errors when dealing with reflective surfaces that deviate from this model. To address this limitation, we propose a novel framework that incorporates intrinsic image decomposition into SSMDE. Our method synergistically trains for both monocular depth estimation and intrinsic image decomposition. The accurate depth estimation facilitates multi-image consistency for intrinsic image decomposition by aligning different view coordinate systems, while the decomposition process identifies reflective areas and excludes corrupted gradients from the depth training process. Furthermore, our framework introduces a pseudo-depth generation and knowledge distillation technique to further enhance the performance of the student model across both reflective and non-reflective surfaces. Comprehensive evaluations on multiple datasets show that our approach significantly outperforms existing SSMDE baselines in depth prediction, especially on reflective surfaces.
翻译:自监督单目深度估计(SSMDE)因其无需真实深度图即可估计深度,在深度学习领域受到关注。该方法通常利用由估计深度生成的合成图像与原始图像之间的光度一致性损失,从而减少对大量数据集采集的需求。然而,传统的光度一致性损失依赖于朗伯体假设,在处理偏离该模型的反射表面时,常导致显著误差。为克服这一局限,我们提出了一种将内在图像分解融入SSMDE的新框架。我们的方法协同训练单目深度估计与内在图像分解。精确的深度估计通过对齐不同视角坐标系,为内在图像分解提供了多图像一致性,而分解过程则识别反射区域,并将受污染的梯度从深度训练过程中排除。此外,我们的框架引入了伪深度生成与知识蒸馏技术,以进一步提升学生模型在反射与非反射表面上的性能。在多个数据集上的综合评估表明,我们的方法在深度预测,尤其是在反射表面上,显著优于现有的SSMDE基线。