Dense simultaneous localization and mapping (SLAM) is crucial for robotics and augmented reality applications. However, current methods are often hampered by the non-volumetric or implicit way they represent a scene. This work introduces SplaTAM, an approach that, for the first time, leverages explicit volumetric representations, i.e., 3D Gaussians, to enable high-fidelity reconstruction from a single unposed RGB-D camera, surpassing the capabilities of existing methods. SplaTAM employs a simple online tracking and mapping system tailored to the underlying Gaussian representation. It utilizes a silhouette mask to elegantly capture the presence of scene density. This combination enables several benefits over prior representations, including fast rendering and dense optimization, quickly determining if areas have been previously mapped, and structured map expansion by adding more Gaussians. Extensive experiments show that SplaTAM achieves up to 2x superior performance in camera pose estimation, map construction, and novel-view synthesis over existing methods, paving the way for more immersive high-fidelity SLAM applications.
翻译:稠密同时定位与建图(SLAM)技术对机器人及增强现实应用至关重要。然而,当前方法常因场景表示的非体积性或隐式特性而受到限制。本工作提出SplaTAM,首次利用显式体积表示——即3D高斯模型——实现单路无位姿RGB-D相机的高保真重建,性能超越现有方法。SplaTAM采用专为底层高斯表示设计的简易在线跟踪与建图系统,并利用轮廓掩模优雅地捕获场景密度存在性。这种组合相较于先前表示方法具备多项优势,包括快速渲染与稠密优化、快速判定区域是否已被建图,以及通过新增高斯单元实现结构化地图扩展。大量实验表明,SplaTAM在相机位姿估计、地图构建和新视角合成三项指标上均达到现有方法2倍的性能提升,为更沉浸的高保真SLAM应用铺平道路。