HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from Focus and Single-Image Priors

We propose HYBRIDDEPTH, a robust depth estimation pipeline that addresses the unique challenges of depth estimation for mobile AR, such as scale ambiguity, hardware heterogeneity, and generalizability. HYBRIDDEPTH leverages the camera features available on mobile devices. It effectively combines the scale accuracy inherent in Depth from Focus (DFF) methods with the generalization capabilities enabled by strong single-image depth priors. By utilizing the focal planes of a mobile camera, our approach accurately captures depth values from focused pixels and applies these values to compute scale and shift parameters for transforming relative depths into metric depths. We test our pipeline as an end-to-end system, with a newly developed mobile client to capture focal stacks, which are then sent to a GPU-powered server for depth estimation. Through comprehensive quantitative and qualitative analyses, we demonstrate that HYBRIDDEPTH not only outperforms state-of-the-art (SOTA) models in common datasets (DDFF12, NYU Depth v2) and a real-world AR dataset ARKitScenes but also demonstrates strong zero-shot generalization. For example, HYBRIDDEPTH trained on NYU Depth v2 achieves comparable performance on the DDFF12 to existing models trained on DDFF12. it also outperforms all the SOTA models in zero-shot performance on the ARKitScenes dataset. Additionally, we conduct a qualitative comparison between our model and the ARCore framework, demonstrating that our models output depth maps are significantly more accurate in terms of structural details and metric accuracy. The source code of this project is available at github.

翻译：我们提出HYBRIDDEPTH，一种鲁棒的深度估计流程，旨在解决移动增强现实（AR）中深度估计面临的独特挑战，如尺度模糊性、硬件异构性和泛化能力。HYBRIDDEPTH充分利用移动设备可用的相机特性，有效结合了聚焦深度（DFF）方法固有的尺度精度优势与强单图像深度先验所赋予的泛化能力。通过利用移动相机焦平面，我们的方法能精确捕获聚焦像素的深度值，并利用这些值计算尺度和偏移参数，从而将相对深度转换为度量深度。我们将该流程作为端到端系统进行测试，其中新开发的移动客户端负责采集焦堆栈数据，随后发送至GPU服务器进行深度估计。通过全面的定量与定性分析，我们证明HYBRIDDEPTH不仅在常用数据集（DDFF12、NYU Depth v2）和真实世界AR数据集ARKitScenes上优于现有最优（SOTA）模型，还展现出强大的零样本泛化能力。例如，在NYU Depth v2上训练的HYBRIDDEPTH在DDFF12数据集上达到了与现有基于DDFF12训练的模型相当的性能；同时在ARKitScenes数据集的零样本测试中，其表现优于所有SOTA模型。此外，我们通过定性对比实验展示了本模型与ARCore框架的输出结果：本模型生成的深度图在结构细节与度量精度方面均显著更优。本项目源代码已发布于github。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日