We present MobiFuse, a high-precision depth perception system on mobile devices that combines dual RGB and Time-of-Flight (ToF) cameras. To achieve this, we leverage physical principles from various environmental factors to propose the Depth Error Indication (DEI) modality, characterizing the depth error of ToF and stereo-matching. Furthermore, we employ a progressive fusion strategy, merging geometric features from ToF and stereo depth maps with depth error features from the DEI modality to create precise depth maps. Additionally, we create a new ToF-Stereo depth dataset, RealToF, to train and validate our model. Our experiments demonstrate that MobiFuse excels over baselines by significantly reducing depth measurement errors by up to 77.7%. It also showcases strong generalization across diverse datasets and proves effectiveness in two downstream tasks: 3D reconstruction and 3D segmentation. The demo video of MobiFuse in real-life scenarios is available at the de-identified YouTube link(https://youtu.be/jy-Sp7T1LVs).
翻译:本文提出MobiFuse,一种在移动设备上结合双RGB与飞行时间(ToF)相机的高精度深度感知系统。为实现这一目标,我们利用来自多种环境因素的物理原理,提出了深度误差指示(DEI)模态,用以表征ToF与立体匹配的深度误差。此外,我们采用渐进式融合策略,将来自ToF与立体深度图的几何特征与DEI模态的深度误差特征相融合,以生成精确的深度图。同时,我们创建了一个新的ToF-立体深度数据集RealToF,用于训练和验证我们的模型。实验表明,MobiFuse显著优于基线方法,深度测量误差最高可降低77.7%。该系统还在多个不同数据集上展现出强大的泛化能力,并在三维重建与三维分割两项下游任务中证明了其有效性。MobiFuse在真实场景中的演示视频可通过去标识化的YouTube链接(https://youtu.be/jy-Sp7T1LVs)获取。