Existing methods for scale-invariant monocular depth estimation (SI MDE) often struggle due to the complexity of the task, and limited and non-diverse datasets, hindering generalizability in real-world scenarios. This is while shift-and-scale-invariant (SSI) depth estimation, simplifying the task and enabling training with abundant stereo datasets achieves high performance. We present a novel approach that leverages SSI inputs to enhance SI depth estimation, streamlining the network's role and facilitating in-the-wild generalization for SI depth estimation while only using a synthetic dataset for training. Emphasizing the generation of high-resolution details, we introduce a novel sparse ordinal loss that substantially improves detail generation in SSI MDE, addressing critical limitations in existing approaches. Through in-the-wild qualitative examples and zero-shot evaluation we substantiate the practical utility of our approach in computational photography applications, showcasing its ability to generate highly detailed SI depth maps and achieve generalization in diverse scenarios.
翻译:现有的尺度不变单目深度估计方法常因任务复杂性、数据集有限且多样性不足而难以实现,这阻碍了其在真实场景中的泛化能力。相比之下,平移与尺度不变深度估计通过简化任务并利用丰富的立体数据集进行训练,能够实现高性能。本文提出一种新颖方法,利用SSI输入来增强SI深度估计,简化网络的作用,并促进SI深度估计在真实场景中的泛化,同时仅使用合成数据集进行训练。着眼于生成高分辨率细节,我们引入了一种新颖的稀疏序数损失函数,该函数显著改善了SSI单目深度估计中的细节生成能力,解决了现有方法的关键局限。通过真实场景的定性示例和零样本评估,我们证实了该方法在计算摄影应用中的实际效用,展示了其生成高细节SI深度图并在多样化场景中实现泛化的能力。