Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd

翻译：虽然单目深度估计方法在标准基准测试中取得了显著进展，但零样本度量深度估计仍未解决。挑战包括室内外场景的联合建模（其通常表现出显著不同的RGB和深度分布），以及由于未知相机内参导致的深度尺度模糊性。近期工作提出了专门的多头架构用于联合建模室内外场景。相比之下，我们倡导一种通用的、任务无关的扩散模型，并采用多项改进，例如对数尺度深度参数化以支持室内外场景联合建模、基于视场（FOV）的条件化处理尺度模糊性、以及在训练过程中通过合成增强视场来泛化训练数据集中有限的相机内参。此外，通过采用比常用方法更多样化的训练混合数据以及高效的扩散参数化，我们的方法DMD（度量深度扩散模型）在仅使用少量去噪步数的情况下，在零样本室内和室外数据集上分别实现了比当前最先进方法相对误差（REL）降低25%和33%。更多概述请见https://diffusion-vision.github.io/dmd。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日