We propose SharpDepth, a novel approach to monocular metric depth estimation that combines the metric accuracy of discriminative depth estimation methods (e.g., Metric3D, UniDepth) with the fine-grained boundary sharpness typically achieved by generative methods (e.g., Marigold, Lotus). Traditional discriminative models trained on real-world data with sparse ground-truth depth can accurately predict metric depth but often produce over-smoothed or low-detail depth maps. Generative models, in contrast, are trained on synthetic data with dense ground truth, generating depth maps with sharp boundaries yet only providing relative depth with low accuracy. Our approach bridges these limitations by integrating metric accuracy with detailed boundary preservation, resulting in depth predictions that are both metrically precise and visually sharp. Our extensive zero-shot evaluations on standard depth estimation benchmarks confirm SharpDepth effectiveness, showing its ability to achieve both high depth accuracy and detailed representation, making it well-suited for applications requiring high-quality depth perception across diverse, real-world environments.
翻译:我们提出SharpDepth,一种新颖的单目度量深度估计方法,它将判别式深度估计方法(如Metric3D、UniDepth)的度量精度与生成式方法(如Marigold、Lotus)通常实现的精细边界锐度相结合。传统的判别式模型在具有稀疏真实深度数据的真实世界数据上训练,能够准确预测度量深度,但往往产生过度平滑或细节缺失的深度图。相比之下,生成式模型在具有密集真实深度的合成数据上训练,生成的深度图边界锐利,但仅能提供精度较低的相对深度。我们的方法通过将度量精度与细节边界保持相结合,弥补了这些局限,从而产生既度量精确又视觉锐利的深度预测结果。我们在标准深度估计基准上进行的大量零样本评估证实了SharpDepth的有效性,表明其能够同时实现高深度精度与细节表征,使其非常适合于需要在多样化真实世界环境中进行高质量深度感知的应用。