Since floorplan data is readily available, long-term persistent, and robust to changes in visual appearance, visual Floorplan Localization (FLoc) has garnered significant attention. Existing methods either ingeniously match geometric priors or utilize sparse semantics to reduce FLoc uncertainty. However, they still suffer from ambiguous FLoc caused by repetitive structures within minimalist floorplans. Moreover, expensive but limited semantic annotations restrict their applicability. To address these issues, we propose DisCo-FLoc, which utilizes dual-level visual-geometric Contrasts to Disambiguate depth-aware visual Floc, without requiring additional semantic labels. Our solution begins with a ray regression predictor tailored for ray-casting-based FLoc, predicting a series of FLoc candidates using depth estimation expertise. In addition, a novel contrastive learning method with position-level and orientation-level constraints is proposed to strictly match depth-aware visual features with the corresponding geometric structures in the floorplan. Such matches can effectively eliminate FLoc ambiguity and select the optimal imaging pose from FLoc candidates. Exhaustive comparative studies on two standard visual Floc benchmarks demonstrate that our method outperforms the state-of-the-art semantic-based method, achieving significant improvements in both robustness and accuracy.
翻译:由于平面图数据易于获取、长期持久且对视觉外观变化具有鲁棒性,视觉平面图定位(FLoc)已受到广泛关注。现有方法或巧妙匹配几何先验,或利用稀疏语义来降低FLoc的不确定性。然而,它们仍受限于极简平面图中重复结构导致的FLoc歧义问题。此外,昂贵且有限的语义标注也制约了其应用范围。为解决这些问题,我们提出DisCo-FLoc,该方法利用双层级视觉-几何对比来消除深度感知视觉FLoc的歧义,且无需额外的语义标注。我们的解决方案始于一个专为基于光线投射的FLoc设计的射线回归预测器,该预测器利用深度估计专业知识预测一系列FLoc候选位姿。此外,我们提出了一种具有位置层级和方向层级约束的新型对比学习方法,以严格匹配深度感知视觉特征与平面图中对应的几何结构。此类匹配能有效消除FLoc歧义,并从FLoc候选中选择最优成像位姿。在两个标准视觉FLoc基准数据集上的详尽对比研究表明,我们的方法优于当前最先进的基于语义的方法,在鲁棒性和准确性方面均实现了显著提升。