Reliable robotic navigation necessitates the seamless integration of accurate global localization and dense, metric-consistent obstacle perception. A common strategy to achieve these capabilities involves integrating diverse sensing modalities: cameras offer rich visual features for localization, while active sensors like LiDAR provide direct metric measurements. However, such multi-sensor configurations necessitate complex spatial-temporal calibration and increase deployment overhead. Although vision-only approaches offer a low-cost and scalable alternative, existing monocular visual systems typically struggle to simultaneously achieve efficient, globally consistent localization and dense, metric-consistent geometric perception. To bridge this gap, we propose \textbf{VGP-Nav}, a unified framework for \textit{Metric-Aware Visual Geometric Perception} that relies solely on monocular RGB input to jointly support metric localization and obstacle perception. Our key insight is to anchor localization-grounded visual geometry to physically meaningful scale constraints derived from ground-plane geometry, thereby providing a reliable metric reference for monocular perception. VGP-Nav resolves monocular scale ambiguity online and produces localization-grounded, metric obstacle representations that are directly applicable to downstream planning. Extensive experiments demonstrate strong generalization across diverse environments and successful deployment on real mobile robots, highlighting the practicality of our approach for scalable, low-cost, and safe autonomous navigation.
翻译:可靠的机器人导航需要精确的全局定位与稠密、度量一致的障碍物感知的无缝集成。实现这些能力的常见策略涉及整合多种传感模态:相机提供丰富的视觉特征用于定位,而激光雷达等主动传感器则提供直接的度量测量。然而,此类多传感器配置需要复杂的时空标定并增加部署开销。尽管纯视觉方法提供了一种低成本且可扩展的替代方案,但现有的单目视觉系统通常难以同时实现高效、全局一致的定位和稠密、度量一致的几何感知。为弥合这一差距,我们提出\textbf{VGP-Nav},一个统一的度量感知视觉几何感知框架,仅依赖单目RGB输入联合支持度量定位与障碍物感知。我们的关键洞察在于将基于定位的视觉几何锚定到由地平面几何导出的物理意义尺度约束,从而为单目感知提供可靠的度量参考。VGP-Nav在线解决单目尺度模糊问题,并生成可直接应用于下游规划的、基于定位的度量障碍物表示。大量实验证明了在多样环境中的强泛化能力以及在实际移动机器人上的成功部署,突显了该方法对于可扩展、低成本且安全自主导航的实用性。