Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

Conventional geometry-based SLAM systems lack dense 3D reconstruction capabilities since their data association usually relies on feature correspondences. Additionally, learning-based SLAM systems often fall short in terms of real-time performance and accuracy. Balancing real-time performance with dense 3D reconstruction capabilities is a challenging problem. In this paper, we propose a real-time RGB-D SLAM system that incorporates a novel view synthesis technique, 3D Gaussian Splatting, for 3D scene representation and pose estimation. This technique leverages the real-time rendering performance of 3D Gaussian Splatting with rasterization and allows for differentiable optimization in real time through CUDA implementation. We also enable mesh reconstruction from 3D Gaussians for explicit dense 3D reconstruction. To estimate accurate camera poses, we utilize a rotation-translation decoupled strategy with inverse optimization. This involves iteratively updating both in several iterations through gradient-based optimization. This process includes differentiably rendering RGB, depth, and silhouette maps and updating the camera parameters to minimize a combined loss of photometric loss, depth geometry loss, and visibility loss, given the existing 3D Gaussian map. However, 3D Gaussian Splatting (3DGS) struggles to accurately represent surfaces due to the multi-view inconsistency of 3D Gaussians, which can lead to reduced accuracy in both camera pose estimation and scene reconstruction. To address this, we utilize depth priors as additional regularization to enforce geometric constraints, thereby improving the accuracy of both pose estimation and 3D reconstruction. We also provide extensive experimental results on public benchmark datasets to demonstrate the effectiveness of our proposed methods in terms of pose accuracy, geometric accuracy, and rendering performance.

翻译：传统基于几何的SLAM系统因其数据关联通常依赖特征对应关系，缺乏稠密三维重建能力。与此同时，基于学习的SLAM系统在实时性与精度方面往往存在不足。平衡实时性能与稠密三维重建能力是一个具有挑战性的问题。本文提出一种实时RGB-D SLAM系统，该系统融合了新颖的3D高斯泼溅技术用于三维场景表示与位姿估计，以实现新视角合成。该技术充分发挥3D高斯泼溅通过栅格化实现的实时渲染性能，并借助CUDA实现支持实时可微分优化。我们还实现了从3D高斯到网格的转换，以支持显式的稠密三维重建。为实现精确的相机位姿估计，我们采用旋转-平移解耦策略配合逆向优化方法，通过基于梯度的优化在多次迭代中同步更新两者。该过程包含对RGB图像、深度图及轮廓图的可微分渲染，并在给定现有3D高斯地图的条件下，通过最小化光度损失、深度几何损失与可见性损失的组合损失来更新相机参数。然而，由于3D高斯基元存在多视角不一致性问题，3D高斯泼溅技术在精确表征物体表面时面临挑战，可能导致相机位姿估计与场景重建精度下降。为解决此问题，我们引入深度先验作为额外的正则化约束以强化几何一致性，从而提升位姿估计与三维重建的精度。我们在公开基准数据集上进行了大量实验，结果验证了所提方法在位姿精度、几何精度与渲染性能方面的有效性。