The availability of real-time semantics greatly improves the core geometric functionality of SLAM systems, enabling numerous robotic and AR/VR applications. We present a new methodology for real-time semantic mapping from RGB-D sequences that combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping. When segmenting a new frame we perform latent feature re-projection from previous frames based on differentiable rendering. Fusing re-projected feature maps from previous frames with current-frame features greatly improves image segmentation quality, compared to a baseline that processes images independently. For 3D map processing, we propose a novel geometric quasi-planar over-segmentation method that groups 3D map elements likely to belong to the same semantic classes, relying on surface normals. We also describe a novel neural network design for lightweight semantic map post-processing. Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems and matches the performance of 3D convolutional networks on three real indoor datasets, while working in real-time. Moreover, it shows better cross-sensor generalization abilities compared to 3D CNNs, enabling training and inference with different depth sensors. Code and data will be released on project page: http://jingwenwang95.github.io/SeMLaPS
翻译:实时语义信息的可用性极大提升了SLAM系统的核心几何功能,为众多机器人及AR/VR应用提供了支撑。我们提出了一种基于RGB-D序列进行实时语义建图的新方法,该方法将二维神经网络与基于三维占位建图SLAM系统的三维网络相结合。在分割新帧时,我们基于可微渲染对历史帧进行隐式特征重投影。与独立处理图像的基线方法相比,将重投影特征图与当前帧特征融合可显著提升图像分割质量。针对三维地图处理,我们提出了一种新颖的几何准平面超分割方法,该方法基于表面法向将可能属于相同语义类的三维地图元素进行分组。同时,我们设计了一种用于轻量级语义地图后处理的新型神经网络架构。在基于二维-三维网络的系统中,本系统达到了最先进的语义建图质量,并在三个真实室内数据集上实现了与三维卷积网络相当的性能,同时保持实时运行。此外,与三维卷积网络相比,本系统展现出更优的跨传感器泛化能力,支持不同深度传感器的训练与推理。代码与数据将在项目页面发布:http://jingwenwang95.github.io/SeMLaPS