Previous attempts to integrate Neural Radiance Fields (NeRF) into the Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or require the ground truth camera poses, which impedes their application in real-world scenarios. This paper proposes a time-varying representation to track and reconstruct the dynamic scenes. Firstly, two processes, a tracking process and a mapping process, are maintained simultaneously in our framework. In the tracking process, all input images are uniformly sampled and then progressively trained in a self-supervised paradigm. In the mapping process, we leverage motion masks to distinguish dynamic objects from the static background, and sample more pixels from dynamic areas. Secondly, the parameter optimization for both processes is comprised of two stages: the first stage associates time with 3D positions to convert the deformation field to the canonical field. The second stage associates time with the embeddings of the canonical field to obtain colors and a Signed Distance Function (SDF). Lastly, we propose a novel keyframe selection strategy based on the overlapping rate. Our approach is evaluated on two synthetic datasets and one real-world dataset, and the experiments validate that our method achieves competitive results in both tracking and mapping when compared to existing state-of-the-art NeRF-based dynamic SLAM systems.
翻译:先前将神经辐射场(NeRF)集成到同步定位与建图(SLAM)框架中的尝试,要么依赖于静态场景的假设,要么需要真实相机位姿作为输入,这阻碍了其在实际场景中的应用。本文提出了一种时变表示方法,用于跟踪和重建动态场景。首先,在我们的框架中同时维护两个过程:跟踪过程与建图过程。在跟踪过程中,所有输入图像被均匀采样,随后以自监督范式进行渐进式训练。在建图过程中,我们利用运动掩码区分动态物体与静态背景,并对动态区域采样更多像素点。其次,两个过程的参数优化均包含两个阶段:第一阶段将时间与三维位置相关联,从而将形变场转换至规范场;第二阶段将时间与规范场的嵌入表示相关联,以获取颜色信息与有向距离函数(SDF)。最后,我们提出了一种基于重叠率的新型关键帧选择策略。我们在两个合成数据集和一个真实数据集上对所提方法进行了评估,实验结果表明,与现有基于NeRF的先进动态SLAM系统相比,我们的方法在跟踪与建图方面均取得了具有竞争力的结果。