Visual simultaneous localization and mapping (SLAM) systems face challenges in detecting loop closure under the circumstance of large viewpoint changes. In this paper, we present an object-based loop closure detection method based on the spatial layout and semanic consistency of the 3D scene graph. Firstly, we propose an object-level data association approach based on the semantic information from semantic labels, intersection over union (IoU), object color, and object embedding. Subsequently, multi-view bundle adjustment with the associated objects is utilized to jointly optimize the poses of objects and cameras. We represent the refined objects as a 3D spatial graph with semantics and topology. Then, we propose a graph matching approach to select correspondence objects based on the structure layout and semantic property similarity of vertices' neighbors. Finally, we jointly optimize camera trajectories and object poses in an object-level pose graph optimization, which results in a globally consistent map. Experimental results demonstrate that our proposed data association approach can construct more accurate 3D semantic maps, and our loop closure method is more robust than point-based and object-based methods in circumstances with large viewpoint changes.
翻译:视觉同步定位与地图构建(SLAM)系统在视角大范围变化的场景下,检测回环面临挑战。本文提出一种基于三维场景图空间布局与语义一致性的物体级回环检测方法。首先,我们利用语义标签、交并比(IoU)、物体颜色及物体嵌入等语义信息,提出一种物体级数据关联方法。随后,通过基于关联物体的多视图光束法平差,联合优化物体与相机的位姿。我们将优化后的物体表示为包含语义与拓扑关系的三维空间图。接着,提出一种基于顶点邻居结构布局与语义属性相似性的图匹配方法,用于选取对应物体。最后,在物体级位姿图优化中联合优化相机轨迹与物体位姿,从而生成全局一致的地图。实验结果表明,本文提出的数据关联方法能构建更精准的三维语义地图,且在大视角变化场景下,本回环检测方法相较基于点与基于物体的方法具有更强的鲁棒性。