The integration of neural rendering and the SLAM system recently showed promising results in joint localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM framework with a hyper primitives map. Specifically, we simultaneously exploit explicit geometric features for localization and learn implicit photometric features to represent the texture information of the observed environment. In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance. The extensive experiments with monocular, stereo, and RGB-D datasets prove that our proposed system Photo-SLAM significantly outperforms current state-of-the-art SLAM systems for online photorealistic mapping, e.g., PSNR is 30% higher and rendering speed is hundreds of times faster in the Replica dataset. Moreover, the Photo-SLAM can run at real-time speed using an embedded platform such as Jetson AGX Orin, showing the potential of robotics applications.
翻译:神经渲染与SLAM系统的融合近期在联合定位与光度真实感视图重建方面展现出令人瞩目的成果。然而,现有方法完全依赖隐式表征,资源消耗巨大,无法在便携设备上运行,这背离了SLAM的原始设计目标。本文提出Photo-SLAM,一种基于超基元地图的新型SLAM框架。具体而言,我们同步利用显式几何特征进行定位,并学习隐式光度特征以表征观测环境的纹理信息。除基于几何特征主动致密化超基元外,我们进一步引入高斯金字塔训练方法,渐进式学习多层级特征,从而增强光度真实感建图性能。基于单目、立体及RGB-D数据集的广泛实验证明,所提出的Photo-SLAM系统在在线光度真实感建图方面显著优于当前最先进的SLAM系统,例如在Replica数据集中PSNR提升30%,渲染速度提升数百倍。此外,Photo-SLAM可在Jetson AGX Orin等嵌入式平台上实现实时运行,展现出机器人应用的潜力。