This paper presents Volume-DROID, a novel approach for Simultaneous Localization and Mapping (SLAM) that integrates Volumetric Mapping and Differentiable Recurrent Optimization-Inspired Design (DROID). Volume-DROID takes camera images (monocular or stereo) or frames from a video as input and combines DROID-SLAM, point cloud registration, an off-the-shelf semantic segmentation network, and Convolutional Bayesian Kernel Inference (ConvBKI) to generate a 3D semantic map of the environment and provide accurate localization for the robot. The key innovation of our method is the real-time fusion of DROID-SLAM and Convolutional Bayesian Kernel Inference (ConvBKI), achieved through the introduction of point cloud generation from RGB-Depth frames and optimized camera poses. This integration, engineered to enable efficient and timely processing, minimizes lag and ensures effective performance of the system. Our approach facilitates functional real-time online semantic mapping with just camera images or stereo video input. Our paper offers an open-source Python implementation of the algorithm, available at https://github.com/peterstratton/Volume-DROID.
翻译:本文提出Volume-DROID,一种融合体积建图与可微分循环优化启发式设计(DROID)的同时定位与建图(SLAM)新方法。Volume-DROID以相机图像(单目或双目)或视频帧为输入,结合DROID-SLAM、点云配准、现成语义分割网络及卷积贝叶斯核推断(ConvBKI),生成环境的三维语义地图,并为机器人提供精确定位。该方法的核心创新在于通过引入RGB深度帧点云生成与优化相机位姿,实现DROID-SLAM与卷积贝叶斯核推断(ConvBKI)的实时融合。这一集成设计旨在实现高效及时的处理,最小化延迟并确保系统有效运行。本方法仅需相机图像或双目视频输入即可实现功能完备的实时在线语义建图。论文提供该算法的开源Python实现,代码托管于https://github.com/peterstratton/Volume-DROID。