Light-weight time-of-flight (ToF) depth sensors are compact and cost-efficient, and thus widely used on mobile devices for tasks such as autofocus and obstacle detection. However, due to the sparse and noisy depth measurements, these sensors have rarely been considered for dense geometry reconstruction. In this work, we present the first dense SLAM system with a monocular camera and a light-weight ToF sensor. Specifically, we propose a multi-modal implicit scene representation that supports rendering both the signals from the RGB camera and light-weight ToF sensor which drives the optimization by comparing with the raw sensor inputs. Moreover, in order to guarantee successful pose tracking and reconstruction, we exploit a predicted depth as an intermediate supervision and develop a coarse-to-fine optimization strategy for efficient learning of the implicit representation. At last, the temporal information is explicitly exploited to deal with the noisy signals from light-weight ToF sensors to improve the accuracy and robustness of the system. Experiments demonstrate that our system well exploits the signals of light-weight ToF sensors and achieves competitive results both on camera tracking and dense scene reconstruction. Project page: \url{https://zju3dv.github.io/tof_slam/}.
翻译:轻型飞行时间(ToF)深度传感器具有紧凑和成本效益高的特点,因此广泛应用于移动设备的自动对焦和障碍物检测等任务中。然而,由于深度测量稀疏且噪声较大,这些传感器很少被用于密集几何重建。在本文中,我们首次提出了一种结合单目相机与轻型ToF传感器的密集SLAM系统。具体而言,我们提出了一种多模态隐式场景表示,支持同时渲染来自RGB相机和轻型ToF传感器的信号,通过与原传感器输入进行比较来驱动优化过程。此外,为确保位姿跟踪和重建的成功,我们利用预测深度作为中间监督,并开发了一种从粗到细的优化策略,以高效学习隐式表示。最后,我们显式地利用时间信息来处理轻型ToF传感器的噪声信号,从而提高系统的准确性和鲁棒性。实验表明,我们的系统充分挖掘了轻型ToF传感器的信号潜力,在相机跟踪和密集场景重建方面均取得了具有竞争力的结果。项目页面:\url{https://zju3dv.github.io/tof_slam/}。