We present a dense simultaneous localization and mapping (SLAM) method that uses 3D Gaussians as a scene representation. Our approach enables interactive-time reconstruction and photo-realistic rendering from real-world single-camera RGBD videos. To this end, we propose a novel effective strategy for seeding new Gaussians for newly explored areas and their effective online optimization that is independent of the scene size and thus scalable to larger scenes. This is achieved by organizing the scene into sub-maps which are independently optimized and do not need to be kept in memory. We further accomplish frame-to-model camera tracking by minimizing photometric and geometric losses between the input and rendered frames. The Gaussian representation allows for high-quality photo-realistic real-time rendering of real-world scenes. Evaluation on synthetic and real-world datasets demonstrates competitive or superior performance in mapping, tracking, and rendering compared to existing neural dense SLAM methods.
翻译:我们提出一种以三维高斯函数作为场景表示的稠密即时定位与地图构建(SLAM)方法。该方法能实现真实世界单摄像头RGBD视频的交互式重建与照片级真实感渲染。为此,我们提出了一种高效策略:针对新探索区域生成新的高斯函数,并对其进行独立于场景规模的在线优化,从而可扩展至更大场景。该策略通过将场景划分为可独立优化且无需常驻内存的子地图来实现。此外,我们通过最小化输入帧与渲染帧之间的光度与几何损失,完成了帧到模型(frame-to-model)的摄像头追踪。高斯表示可对真实场景进行高质量的逼真实时渲染。在合成与真实数据集上的评估表明,与现有神经稠密SLAM方法相比,本方法在地图构建、追踪与渲染方面表现出色或更优性能。