We propose HYBRIDDEPTH, a robust depth estimation pipeline that addresses the unique challenges of depth estimation for mobile AR, such as scale ambiguity, hardware heterogeneity, and generalizability. HYBRIDDEPTH leverages the camera features available on mobile devices. It effectively combines the scale accuracy inherent in Depth from Focus (DFF) methods with the generalization capabilities enabled by strong single-image depth priors. By utilizing the focal planes of a mobile camera, our approach accurately captures depth values from focused pixels and applies these values to compute scale and shift parameters for transforming relative depths into metric depths. We test our pipeline as an end-to-end system, with a newly developed mobile client to capture focal stacks, which are then sent to a GPU-powered server for depth estimation. Through comprehensive quantitative and qualitative analyses, we demonstrate that HYBRIDDEPTH not only outperforms state-of-the-art (SOTA) models in common datasets (DDFF12, NYU Depth v2) and a real-world AR dataset ARKitScenes but also demonstrates strong zero-shot generalization. For example, HYBRIDDEPTH trained on NYU Depth v2 achieves comparable performance on the DDFF12 to existing models trained on DDFF12. it also outperforms all the SOTA models in zero-shot performance on the ARKitScenes dataset. Additionally, we conduct a qualitative comparison between our model and the ARCore framework, demonstrating that our models output depth maps are significantly more accurate in terms of structural details and metric accuracy. The source code of this project is available at github.
翻译:我们提出HYBRIDDEPTH,一种鲁棒的深度估计流程,旨在解决移动增强现实(AR)中深度估计面临的独特挑战,如尺度模糊性、硬件异构性和泛化能力。HYBRIDDEPTH充分利用移动设备可用的相机特性,有效结合了聚焦深度(DFF)方法固有的尺度精度优势与强单图像深度先验所赋予的泛化能力。通过利用移动相机焦平面,我们的方法能精确捕获聚焦像素的深度值,并利用这些值计算尺度和偏移参数,从而将相对深度转换为度量深度。我们将该流程作为端到端系统进行测试,其中新开发的移动客户端负责采集焦堆栈数据,随后发送至GPU服务器进行深度估计。通过全面的定量与定性分析,我们证明HYBRIDDEPTH不仅在常用数据集(DDFF12、NYU Depth v2)和真实世界AR数据集ARKitScenes上优于现有最优(SOTA)模型,还展现出强大的零样本泛化能力。例如,在NYU Depth v2上训练的HYBRIDDEPTH在DDFF12数据集上达到了与现有基于DDFF12训练的模型相当的性能;同时在ARKitScenes数据集的零样本测试中,其表现优于所有SOTA模型。此外,我们通过定性对比实验展示了本模型与ARCore框架的输出结果:本模型生成的深度图在结构细节与度量精度方面均显著更优。本项目源代码已发布于github。