Accurately estimating depth in 360-degree imagery is crucial for virtual reality, autonomous navigation, and immersive media applications. Existing depth estimation methods designed for perspective-view imagery fail when applied to 360-degree images due to different camera projections and distortions, whereas 360-degree methods perform inferior due to the lack of labeled data pairs. We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively. Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels through a six-face cube projection technique, enabling efficient labeling of depth in 360-degree images. This method leverages the increasing availability of large datasets. Our approach includes two main stages: offline mask generation for invalid regions and an online semi-supervised joint training regime. We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy, particularly in zero-shot scenarios. Our proposed training pipeline can enhance any 360 monocular depth estimator and demonstrates effective knowledge transfer across different camera projections and data types. See our project page for results: https://albert100121.github.io/Depth-Anywhere/
翻译:在360度全景图像中精确估计深度对于虚拟现实、自主导航和沉浸式媒体应用至关重要。现有针对透视视图图像设计的深度估计方法,由于相机投影和畸变的不同,在应用于360度图像时效果不佳;而360度专用方法则因缺乏带标签的数据对而表现欠佳。我们提出了一种新的深度估计框架,能够有效利用无标签的360度数据。我们的方法使用最先进的透视深度估计模型作为教师模型,通过六面立方体投影技术生成伪标签,从而实现对360度图像深度的高效标注。该方法充分利用了日益丰富的大型数据集。我们的框架包含两个主要阶段:针对无效区域的离线掩码生成,以及在线的半监督联合训练机制。我们在Matterport3D和Stanford2D3D等基准数据集上测试了所提方法,结果显示深度估计精度得到显著提升,尤其在零样本场景下。我们提出的训练流程能够增强任何360度单目深度估计器,并展示了跨不同相机投影与数据类型的有效知识迁移。实验结果请参见项目页面:https://albert100121.github.io/Depth-Anywhere/