Loop closure detection (LCD) is a core component of simultaneous localization and mapping (SLAM): it identifies revisited places and enables pose-graph constraints that correct accumulated drift. Classic bag-of-words approaches such as DBoW are efficient but often degrade under appearance change and perceptual aliasing. In parallel, deep learning-based visual place recognition (VPR) descriptors (e.g., NetVLAD and Transformer-based models) offer stronger robustness, but their computational cost is often viewed as a barrier to real-time SLAM. In this paper, we empirically evaluate NetVLAD as an LCD module and compare it against DBoW on the KITTI dataset. We introduce a Fine-Grained Top-K precision-recall curve that better reflects LCD settings where a query may have zero or multiple valid matches. With Faiss-accelerated nearestneighbor search, NetVLAD achieves real-time query speed while improving accuracy and robustness over DBoW, making it a practical drop-in alternative for LCD in SLAM.
翻译:回环检测是同步定位与建图的核心组件:它通过识别重访位置建立位姿图约束,从而修正累积漂移。经典的词袋方法(如DBoW)具有高效性,但在外观变化与感知混淆场景下性能常出现退化。与此同时,基于深度学习的视觉位置识别描述符(如NetVLAD与基于Transformer的模型)展现出更强的鲁棒性,但其计算开销常被视为实时SLAM应用的瓶颈。本文通过KITTI数据集对NetVLAD作为回环检测模块进行实证评估,并与DBoW进行对比。我们提出了一种细粒度Top-K精确率-召回率曲线,该曲线能更准确地反映查询可能具有零个或多个有效匹配的回环检测场景。借助Faiss加速的最近邻搜索,NetVLAD在实现实时查询速度的同时,其准确性与鲁棒性均优于DBoW,从而成为SLAM系统中具备实用价值的即插即用式回环检测替代方案。