We propose NEDS-SLAM, an Explicit Dense semantic SLAM system based on 3D Gaussian representation, that enables robust 3D semantic mapping, accurate camera tracking, and high-quality rendering in real-time. In the system, we propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head on semantic reconstruction, achieving robust 3D semantic Gaussian mapping. Additionally, we employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation, mitigating the burden of excessive memory consumption. Furthermore, we leverage the advantage of 3D Gaussian splatting, which enables efficient and differentiable novel view rendering, and propose a Virtual Camera View Pruning method to eliminate outlier GS points, thereby effectively enhancing the quality of scene representations. Our NEDS-SLAM method demonstrates competitive performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in 3D dense semantic mapping.
翻译:我们提出NEDS-SLAM,一种基于3D高斯表达的显式密集语义SLAM系统,能够实现鲁棒的3D语义建图、精确的相机跟踪以及实时的高质量渲染。在该系统中,我们提出一种空间一致特征融合模型,以降低预训练分割头中错误估计对语义重建的影响,从而构建鲁棒的3D语义高斯建图。此外,我们采用轻量级编码器-解码器将高维语义特征压缩为紧凑的3D高斯表达,缓解过度内存消耗的负担。进一步,我们利用3D高斯泼溅的优势——实现高效且可微分的新视角渲染,并提出一种虚拟相机视图剪枝方法以剔除异常GS点,从而有效提升场景表征质量。我们的NEDS-SLAM方法在Replica和ScanNet数据集上,在建图与跟踪精度方面展现出优于现有密集语义SLAM方法的竞争性表现,同时在3D密集语义建图中也表现出卓越能力。