While recent depth estimation methods exhibit strong zero-shot generalization, achieving accurate metric depth across diverse camera types-particularly those with large fields of view (FoV) such as fisheye and 360-degree cameras-remains a significant challenge. This paper presents Depth Any Camera (DAC), a powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cameras with varying FoVs. The framework is designed to ensure that all existing 3D data can be leveraged, regardless of the specific camera types used in new applications. Remarkably, DAC is trained exclusively on perspective images but generalizes seamlessly to fisheye and 360-degree cameras without the need for specialized training data. DAC employs Equi-Rectangular Projection (ERP) as a unified image representation, enabling consistent processing of images with diverse FoVs. Its key components include a pitch-aware Image-to-ERP conversion for efficient online augmentation in ERP space, a FoV alignment operation to support effective training across a wide range of FoVs, and multi-resolution data augmentation to address resolution disparities between training and testing. DAC achieves state-of-the-art zero-shot metric depth estimation, improving delta-1 ($\delta_1$) accuracy by up to 50% on multiple fisheye and 360-degree datasets compared to prior metric depth foundation models, demonstrating robust generalization across camera types.
翻译:尽管当前深度估计方法展现出强大的零样本泛化能力,但在不同相机类型(尤其是具有大视场角(FoV)的鱼眼相机和360度相机)上实现精确的度量深度估计仍然面临重大挑战。本文提出任意相机深度估计(DAC)框架,这是一个强大的零样本度量深度估计系统,通过扩展透视训练模型来有效处理不同视场角的相机。该框架设计确保所有现有3D数据都能被充分利用,不受新应用中所用具体相机类型的限制。值得注意的是,DAC仅使用透视图像进行训练,却能无缝泛化至鱼眼和360度相机,无需专用训练数据。DAC采用等距柱面投影(ERP)作为统一图像表示方法,实现对不同视场角图像的一致性处理。其核心组件包括:用于ERP空间高效在线增强的俯仰感知图像到ERP转换、支持宽视场角范围有效训练的视场角对齐操作,以及解决训练与测试分辨率差异的多分辨率数据增强技术。DAC在零样本度量深度估计任务中达到最先进水平,在多个鱼眼和360度数据集上,相比现有度量深度基础模型将delta-1($\delta_1$)精度提升高达50%,展现出跨相机类型的强大泛化能力。