We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking accuracy.
翻译:我们提出实时高斯SLAM(RTG-SLAM),一种利用高斯泼溅技术、面向大规模环境的RGBD相机实时三维重建系统。该系统包含紧凑的高斯表示和高效的即时高斯优化方案。我们强制每个高斯体要么完全不透明,要么近乎透明,其中不透明高斯体拟合表面和主色,透明高斯体拟合残余颜色。通过采用与颜色渲染不同的深度渲染方式,我们使单个不透明高斯体即可良好拟合局部表面区域,无需多个重叠高斯体,从而大幅降低内存与计算成本。在即时高斯优化中,我们为每帧中三类像素显式添加高斯体:新观测像素、颜色误差较大像素、深度误差较大像素。同时将所有高斯体分为稳定与不稳定两类:稳定高斯体能良好拟合已观测RGBD图像,反之则为不稳定高斯体。我们仅优化不稳定高斯体,并仅渲染被不稳定高斯体占据的像素。通过这种方式,待优化高斯体数量与待渲染像素数量均大幅减少,使得优化过程能够实时完成。我们展示了多种大规模场景的实时重建效果。与当前最先进的基于NeRF的RGBD SLAM相比,本系统在实现同等高质量重建的同时,速度提升约两倍、内存成本降低一半,并在新视角合成真实感与相机跟踪精度方面表现更优。