We propose SemGauss-SLAM, the first semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering in real-time. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift and improve reconstruction accuracy, we introduce semantic-informed bundle adjustment leveraging semantic associations for joint optimization of 3D Gaussian representation and camera poses, leading to more robust tracking and consistent mapping. Our SemGauss-SLAM method demonstrates superior performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in novel-view semantic synthesis and 3D semantic mapping.
翻译:我们提出SemGauss-SLAM,这是首个利用3D高斯表示实现实时精确3D语义建图、鲁棒相机跟踪与高质量渲染的语义SLAM系统。在该系统中,我们将语义特征嵌入引入3D高斯表示,通过环境空间布局有效编码语义信息,从而实现对语义场景的精准表示。此外,我们提出基于特征级别的损失函数用于更新3D高斯表示,为优化过程提供更高层级的指导。同时,为减少累积漂移并提升重建精度,我们引入基于语义信息的束调整方法,利用语义关联联合优化3D高斯表示与相机位姿,实现更鲁棒的跟踪与一致性建图。在Replica和ScanNet数据集上的实验表明,我们的SemGauss-SLAM方法在建图与跟踪精度方面显著优于现有密集语义SLAM方法,并在新视角语义合成与3D语义建图中展现出卓越性能。