We propose a dense dynamic RGB-D SLAM pipeline based on a learning-based visual odometry, TartanVO. TartanVO, like other direct methods rather than feature-based, estimates camera pose through dense optical flow, which only applies to static scenes and disregards dynamic objects. Due to the color constancy assumption, optical flow is not able to differentiate between dynamic and static pixels. Therefore, to reconstruct a static map through such direct methods, our pipeline resolves dynamic/static segmentation by leveraging the optical flow output, and only fuse static points into the map. Moreover, we rerender the input frames such that the dynamic pixels are removed and iteratively pass them back into the visual odometry to refine the pose estimate.
翻译:我们提出一种基于学习型视觉里程计TartanVO的密集动态RGB-D SLAM流水线。与基于特征的方法不同,TartanVO作为直接方法,通过稠密光流估计相机位姿,但该方法仅适用于静态场景,并忽略动态物体。由于颜色恒常性假设,光流无法区分动态与静态像素。因此,为通过此类直接方法重建静态地图,我们的流水线利用光流输出解决动态/静态分割问题,仅将静态点融合到地图中。此外,我们重新渲染输入帧以移除动态像素,并迭代地将处理后的帧反馈回视觉里程计以优化位姿估计。