We propose NEDS-SLAM, an Explicit Dense semantic SLAM system based on 3D Gaussian representation, that enables robust 3D semantic mapping, accurate camera tracking, and high-quality rendering in real-time. In the system, we propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head on semantic reconstruction, achieving robust 3D semantic Gaussian mapping. Additionally, we employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation, mitigating the burden of excessive memory consumption. Furthermore, we leverage the advantage of 3D Gaussian splatting, which enables efficient and differentiable novel view rendering, and propose a Virtual Camera View Pruning method to eliminate outlier GS points, thereby effectively enhancing the quality of scene representations. Our NEDS-SLAM method demonstrates competitive performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in 3D dense semantic mapping.
翻译:我们提出NEDS-SLAM——一种基于3D高斯表示的显式密集语义SLAM系统,能够实现鲁棒的3D语义建图、精确的相机跟踪以及高质量的实时渲染。在该系统中,我们提出了一种空间一致的特征融合模型,以降低预训练分割头产生的错误估计对语义重建的影响,从而实现鲁棒的3D语义高斯建图。此外,我们采用轻量级编码器-解码器将高维语义特征压缩为紧凑的3D高斯表示,从而减轻过度的内存消耗负担。进一步地,我们利用3D高斯溅射的优势,实现高效且可微分的新视角渲染,并提出虚拟相机视角剪枝方法以剔除离群的高斯点,从而有效提升场景表示质量。在Replica和ScanNet数据集上,我们的NEDS-SLAM方法在现有密集语义SLAM方法中展现出具有竞争力的建图与跟踪精度,同时表现出卓越的3D密集语义建图能力。