Accurately estimating depth in 360-degree imagery is crucial for virtual reality, autonomous navigation, and immersive media applications. Existing depth estimation methods designed for perspective-view imagery fail when applied to 360-degree images due to different camera projections and distortions, whereas 360-degree methods perform inferior due to the lack of labeled data pairs. We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively. Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels through a six-face cube projection technique, enabling efficient labeling of depth in 360-degree images. This method leverages the increasing availability of large datasets. Our approach includes two main stages: offline mask generation for invalid regions and an online semi-supervised joint training regime. We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy, particularly in zero-shot scenarios. Our proposed training pipeline can enhance any 360 monocular depth estimator and demonstrates effective knowledge transfer across different camera projections and data types. See our project page for results: https://albert100121.github.io/Depth-Anywhere/
翻译:在360度全景图像中准确估计深度对于虚拟现实、自主导航和沉浸式媒体应用至关重要。针对透视视图图像设计的现有深度估计方法,由于相机投影与畸变模式的差异,在应用于360度图像时效果不佳;而专门针对360度的方法则因缺乏带标签的数据对而性能受限。本文提出一种新型深度估计框架,能有效利用无标签的360度数据。我们的方法采用先进的透视深度估计模型作为教师模型,通过六面立方体投影技术生成伪标签,从而实现对360度图像深度的高效标注。该方法充分利用了日益丰富的大规模数据集资源。我们的框架包含两个主要阶段:针对无效区域的离线掩码生成,以及在线半监督联合训练机制。我们在Matterport3D和Stanford2D3D等基准数据集上验证了所提方法,结果表明其在深度估计精度——特别是在零样本场景下——取得了显著提升。我们提出的训练流程能够增强任意360度单目深度估计器,并展示了跨不同相机投影与数据类型的有效知识迁移。实验结果详见项目页面:https://albert100121.github.io/Depth-Anywhere/