Modern autonomous driving depends on accurate metric 3D understanding for perception, reconstruction, and planning, which in turn requires reliable multi-camera depth prediction. However, the outward-facing nature of vehicle-mounted surround-view camera rigs inherently limits visual overlap across views, challenging the correspondence-based assumptions that underpin conventional multi-view geometry. To bridge this gap, we present SurroundNEXO, named after the Spanish word nexo for a geometric link, a low-overlap multi-camera metric depth framework that grounds cross-view reasoning in ego-centric geometry rather than dense visual correspondences. Instead of directly enforcing early global fusion, SurroundNEXO first assigns image tokens globally comparable ego-frame viewing directions through Ego-Ray Positional Encoding, then uses sparse LiDAR measurements as metric anchors to propagate absolute scale cues, and finally expands feature interaction progressively from view-local modeling to decomposed spatio-temporal reasoning and global integration. This design enables metric-scale depth prediction with improved spatial consistency across weakly overlapping cameras. Across low-overlap autonomous driving benchmarks, including NuScenes, Waymo and DDAD, SurroundNEXO reduces single-view error by 33.2%, improves cross-view consistency by 10.5%, and enhances metric reconstruction quality by 25.6% compared with SOTA methods. It further remains robust under extremely sparse depth prompts and exhibits strong zero-shot generalization to unseen camera layouts.
翻译:现代自主驾驶依赖精确的度量三维理解来实现感知、重建和规划,而这又需要可靠的多相机深度预测。然而,车载环视相机系统的外向式布局本质上限制了视图间的视觉重叠,挑战了支撑传统多视图几何的基于对应的假设。为弥合这一差距,我们提出SurroundNEXO(以西班牙语“nexo”命名,意为几何连接),这是一个低重叠多相机度量深度框架,它基于自车几何而非密集视觉对应进行跨视图推理。SurroundNEXO并非直接强制早期全局融合,而是首先通过自射线位置编码将图像令牌分配为全局可比较的自车框架视角方向,然后利用稀疏激光雷达测量作为度量锚点传播绝对尺度线索,最后逐步扩展特征交互,从视图局部建模到分解的时空推理和全局集成。这种设计能够实现弱重叠相机间空间一致性更佳的度量尺度深度预测。在包括NuScenes、Waymo和DDAD在内的低重叠自主驾驶基准测试中,与最先进方法相比,SurroundNEXO将单视图误差降低33.2%,跨视图一致性提升10.5%,度重建质量提升25.6%。它还在极度稀疏深度提示下保持鲁棒性,并对未见相机布局展现出强大的零样本泛化能力。