Visual Simultaneous Localization and Mapping (vSLAM) is a widely used technique in robotics and computer vision that enables a robot to create a map of an unfamiliar environment using a camera sensor while simultaneously tracking its position over time. In this paper, we propose a novel RGBD vSLAM algorithm that can learn a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner. Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping. The mapping network learns the SDF of the scene as well as RGB, depth, and semantic maps of any novel view using only a set of keyframes. Additionally, we extend our pipeline to large scenes by using multiple local mapping networks. Extensive experiments on well-known benchmark datasets confirm that our approach provides robust tracking, mapping, and semantic labeling even with noisy, sparse, or no input depth. Overall, our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
翻译:视觉同步定位与地图构建(vSLAM)是机器人学和计算机视觉中广泛应用的技术,它使机器人在使用摄像头传感器探索陌生环境的同时,能够实时追踪自身位置并构建环境地图。本文提出一种新颖的RGBD vSLAM算法,能够以在线方式学习室内场景的内存高效稠密三维几何结构及语义分割。我们的流水线将基于经典三维视觉的追踪与闭环检测技术,同基于神经场的地图构建方法相结合。该地图构建网络仅利用一组关键帧,即可学习场景的有符号距离函数(SDF)以及任意新视角下的RGB图像、深度图和语义分割图。此外,通过采用多个局部地图构建网络,我们将流水线扩展至大规模场景。在主流基准数据集上的大量实验表明,即便在输入深度存在噪声、稀疏或缺失的情况下,本方法仍能实现稳健的追踪、地图构建与语义标注。总体而言,所提算法能够显著增强场景感知能力,为多种机器人控制问题提供有效支持。