Accurate and robust camera tracking in dynamic environments presents a significant challenge for visual SLAM (Simultaneous Localization and Mapping). Recent progress in this field often involves the use of deep learning techniques to generate mask for dynamic objects, which usually require GPUs to operate in real-time (30 fps). Therefore, this paper proposes a novel visual SLAM system for dynamic environments that obtains real-time performance on CPU by incorporating a mask prediction mechanism, which allows the deep learning method and the camera tracking to run entirely in parallel at different frequencies such that neither waits for the result from the other. Based on this, it further introduces a dual-stage optical flow tracking approach and employs a hybrid usage of optical flow and ORB features, which significantly enhance the efficiency and robustness of the system. Compared with state-of-the-art methods, this system maintains high localization accuracy in dynamic environments while achieving a tracking frame rate of 56 fps on a single laptop CPU without any hardware acceleration, thus proving that deep learning methods are still feasible for dynamic SLAM even without GPU support. Based on the available information, this is the first SLAM system to achieve this.
翻译:动态环境中的精准鲁棒相机跟踪是视觉SLAM(同时定位与地图构建)面临的重大挑战。该领域的最新进展通常采用深度学习技术生成动态物体掩码,但此类方法通常需要GPU才能实现实时运行(30 fps)。为此,本文提出一种面向动态环境的新型视觉SLAM系统,通过引入掩码预测机制,使深度学习方法和相机跟踪以不同频率完全并行运行,实现两者互不等待的CPU实时性能。该系统进一步提出双阶段光流跟踪方法,并混合使用光流与ORB特征,显著提升了系统的效率与鲁棒性。与现有最优方法相比,该系统在动态环境中保持高定位精度的同时,能够在无硬件加速的单笔记本CPU上达到56 fps的跟踪帧率,由此证明深度学习在无GPU支持下仍可应用于动态SLAM。根据现有信息,本文提出的SLAM系统是首个实现该性能的方案。