This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while maintaining metric scale. Our flagship model, ZoeD-M12-NK, is pre-trained on 12 datasets using relative depth and fine-tuned on two datasets using metric depth. We use a lightweight head with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier. Our framework admits multiple configurations depending on the datasets used for relative depth pre-training and metric fine-tuning. Without pre-training, we can already significantly improve the state of the art (SOTA) on the NYU Depth v2 indoor dataset. Pre-training on twelve datasets and fine-tuning on the NYU Depth v2 indoor dataset, we can further improve SOTA for a total of 21% in terms of relative absolute error (REL). Finally, ZoeD-M12-NK is the first model that can jointly train on multiple datasets (NYU Depth v2 and KITTI) without a significant drop in performance and achieve unprecedented zero-shot generalization performance to eight unseen datasets from both indoor and outdoor domains. The code and pre-trained models are publicly available at https://github.com/isl-org/ZoeDepth .
翻译:本文针对单张图像深度估计问题展开研究。现有工作要么忽略度量尺度追求泛化性能(即相对深度估计),要么在特定数据集上追求最优结果(即度量深度估计)。本文首次提出将两者结合的方法,在保持度量尺度的同时实现出色的泛化性能。旗舰模型ZoeD-M12-NK采用12个数据集进行相对深度预训练,并在两个数据集上进行度量深度微调。我们针对每个领域设计了轻量级头部模块,其中包含名为"度量分箱模块"的新型分箱调整结构。在推理阶段,通过隐式分类器自动将输入图像路由至对应头部。根据相对深度预训练与度量微调所使用的数据集差异,本框架可支持多种配置。无需预训练即可显著提升NYU Depth v2室内数据集的当前最优性能。经过12个数据集预训练并在NYU Depth v2室内数据集微调后,相对绝对误差指标总体降低21%,进一步刷新了最优结果。最终,ZoeD-M12-NK成为首个能在多数据集(NYU Depth v2与KITTI)上联合训练且性能不发生显著下降的模型,并在8个未知的室内外数据集上实现了前所未有的零样本泛化性能。相关代码与预训练模型已开源至https://github.com/isl-org/ZoeDepth。