Making multi-camera visual SLAM systems easier to set up and more robust to the environment is always one of the focuses of vision robots. Existing monocular and binocular vision SLAM systems have narrow FoV and are fragile in textureless environments with degenerated accuracy and limited robustness. Thus multi-camera SLAM systems are gaining attention because they can provide redundancy for texture degeneration with wide FoV. However, current multi-camera SLAM systems face massive data processing pressure and elaborately designed camera configurations, leading to estimation failures for arbitrarily arranged multi-camera systems. To address these problems, we propose a generic visual odometry for arbitrarily arranged multi-cameras, which can achieve metric-scale state estimation with high flexibility in the cameras' arrangement. Specifically, we first design a learning-based feature extraction and tracking framework to shift the pressure of CPU processing of multiple video streams. Then we use the rigid constraints between cameras to estimate the metric scale poses for robust SLAM system initialization. Finally, we fuse the features of the multi-cameras in the SLAM back-end to achieve robust pose estimation and online scale optimization. Additionally, multi-camera features help improve the loop detection for pose graph optimization. Experiments on KITTI-360 and MultiCamData datasets validate the robustness of our method over arbitrarily placed cameras. Compared with other stereo and multi-camera visual SLAM systems, our method obtains higher pose estimation accuracy with better generalization ability. Our codes and online demos are available at \url{https://github.com/JunhaoWang615/MCVO}
翻译:简化多相机视觉SLAM系统的部署流程并提升其环境鲁棒性,始终是视觉机器人领域的重点研究方向。现有的单目与双目视觉SLAM系统存在视场狭窄、在弱纹理环境中稳定性差等问题,其精度易退化且鲁棒性有限。多相机SLAM系统因其能通过宽视场为纹理退化场景提供冗余信息而备受关注。然而,现有系统面临海量数据处理压力与精密的相机配置要求,导致对任意排列的多相机系统难以实现有效状态估计。为解决这些问题,本文提出一种适用于任意排列多相机的通用视觉里程计,可在相机布局高度灵活的前提下实现公制尺度的状态估计。具体而言,我们首先设计基于学习的特征提取与跟踪框架,以转移多路视频流对CPU的处理压力;继而利用相机间的刚性约束关系估计公制尺度位姿,实现鲁棒的SLAM系统初始化;最后在SLAM后端融合多相机特征,实现鲁棒的位姿估计与在线尺度优化。此外,多相机特征有助于提升位姿图优化中的回环检测性能。在KITTI-360与MultiCamData数据集上的实验验证了本方法对任意放置相机配置的鲁棒性。相较于其他双目及多相机视觉SLAM系统,本方法以更优的泛化能力获得了更高的位姿估计精度。代码与在线演示详见 \url{https://github.com/JunhaoWang615/MCVO}