Dense SLAM based on monocular cameras does indeed have immense application value in the field of AR/VR, especially when it is performed on a mobile device. In this paper, we propose a novel method that integrates a light-weight depth completion network into a sparse SLAM system using a multi-basis depth representation, so that dense mapping can be performed online even on a mobile phone. Specifically, we present a specifically optimized multi-basis depth completion network, called BBC-Net, tailored to the characteristics of traditional sparse SLAM systems. BBC-Net can predict multiple balanced bases and a confidence map from a monocular image with sparse points generated by off-the-shelf keypoint-based SLAM systems. The final depth is a linear combination of predicted depth bases that can be optimized by tuning the corresponding weights. To seamlessly incorporate the weights into traditional SLAM optimization and ensure efficiency and robustness, we design a set of depth weight factors, which makes our network a versatile plug-in module, facilitating easy integration into various existing sparse SLAM systems and significantly enhancing global depth consistency through bundle adjustment. To verify the portability of our method, we integrate BBC-Net into two representative SLAM systems. The experimental results on various datasets show that the proposed method achieves better performance in monocular dense mapping than the state-of-the-art methods. We provide an online demo running on a mobile phone, which verifies the efficiency and mapping quality of the proposed method in real-world scenarios.
翻译:基于单目相机的稠密SLAM在AR/VR领域确实具有巨大的应用价值,特别是在移动设备上运行时。本文提出了一种新颖方法,通过多基深度表示将轻量级深度补全网络集成到稀疏SLAM系统中,从而即使在手机上也能实现在线稠密建图。具体而言,我们提出了一个专门针对传统稀疏SLAM系统特性优化设计的多基深度补全网络——BBC-Net。该网络能够从单目图像中,结合现有基于关键点的SLAM系统生成的稀疏点,预测多个平衡基和置信度图。最终深度由预测的深度基线性组合而成,可通过调整对应权重进行优化。为了将权重无缝融入传统SLAM优化并确保效率与鲁棒性,我们设计了一组深度权重因子,使该网络成为通用的即插即用模块,便于集成到多种现有稀疏SLAM系统中,并通过束调整显著提升全局深度一致性。为验证方法的可移植性,我们将BBC-Net集成到两个代表性SLAM系统中。多个数据集上的实验结果表明,该方法在单目稠密建图上优于现有最优方法。我们提供了在手机上运行的在线演示,验证了该方法在真实场景中的效率与建图质量。