Dense simultaneous localization and mapping (SLAM) is pivotal for embodied scene understanding. Recent work has shown that 3D Gaussians enable high-quality reconstruction and real-time rendering of scenes using multiple posed cameras. In this light, we show for the first time that representing a scene by 3D Gaussians can enable dense SLAM using a single unposed monocular RGB-D camera. Our method, SplaTAM, addresses the limitations of prior radiance field-based representations, including fast rendering and optimization, the ability to determine if areas have been previously mapped, and structured map expansion by adding more Gaussians. We employ an online tracking and mapping pipeline while tailoring it to specifically use an underlying Gaussian representation and silhouette-guided optimization via differentiable rendering. Extensive experiments show that SplaTAM achieves up to 2X state-of-the-art performance in camera pose estimation, map construction, and novel-view synthesis, demonstrating its superiority over existing approaches, while allowing real-time rendering of a high-resolution dense 3D map.
翻译:稠密同时定位与建图(SLAM)对具身场景理解至关重要。近期研究表明,3D高斯表示可通过多视角已标定相机实现高质量重建与实时场景渲染。基于此,我们首次证明,利用单一无位姿单目RGB-D相机,3D高斯表示即可实现稠密SLAM。本文方法SplaTAM克服了以往基于辐射场表示的局限性,包括快速渲染与优化、可判定区域是否曾被建图,以及通过增加更多高斯实现结构化地图扩展。我们采用在线追踪与建图流水线,并针对性地设计底层高斯表示,通过可微渲染的轮廓引导优化。大量实验表明,SplaTAM在相机位姿估计、地图构建和新视角合成方面达到最先进性能(最高提升2倍),证明了其相较于现有方法的优越性,同时支持高分辨率稠密3D地图的实时渲染。