We propose SemGauss-SLAM, the first semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering in real-time. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift and improve reconstruction accuracy, we introduce semantic-informed bundle adjustment leveraging semantic associations for joint optimization of 3D Gaussian representation and camera poses, leading to more robust tracking and consistent mapping. Our SemGauss-SLAM method demonstrates superior performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in novel-view semantic synthesis and 3D semantic mapping.
翻译:我们提出了SemGauss-SLAM,这是首个利用3D高斯表示实现实时精确3D语义建图、鲁棒相机跟踪与高质量渲染的语义SLAM系统。在该系统中,我们将语义特征嵌入引入3D高斯表示,该机制能有效将语义信息编码于环境空间布局中,从而实现精确的语义场景表征。此外,我们提出用于更新3D高斯表示的特征级损失函数,为3D高斯优化提供更高层级的引导。同时,为减少累积漂移并提升重建精度,我们引入基于语义信息的束调整方法,利用语义关联实现3D高斯表示与相机位姿的联合优化,从而获得更鲁棒的跟踪与一致性建图。在Replica与ScanNet数据集上的实验表明,我们的SemGauss-SLAM方法在建图与跟踪精度上均优于现有密集语义SLAM方法,同时在新视角语义合成与3D语义建图方面展现出卓越能力。