Aligning ground-level imagery with geo-registered satellite maps is crucial for mapping, navigation, and situational awareness, yet remains challenging under large viewpoint gaps or when GPS is unreliable. We introduce Wrivinder, a zero-shot, geometry-driven framework that aggregates multiple ground photographs to reconstruct a consistent 3D scene and align it with overhead satellite imagery. Wrivinder combines SfM reconstruction, 3D Gaussian Splatting, semantic grounding, and monocular depth--based metric cues to produce a stable zenith-view rendering that can be directly matched to satellite context for metrically accurate camera geo-localization. To support systematic evaluation of this task, which lacks suitable benchmarks, we also release MC-Sat, a curated dataset linking multi-view ground imagery with geo-registered satellite tiles across diverse outdoor environments. Together, Wrivinder and MC-Sat provide a first comprehensive baseline and testbed for studying geometry-centered cross-view alignment without paired supervision. In zero-shot experiments, Wrivinder achieves sub-30\,m geolocation accuracy across both dense and large-area scenes, highlighting the promise of geometry-based aggregation for robust ground-to-satellite localization.
翻译:将地面图像与地理配准的卫星地图对齐对于制图、导航和态势感知至关重要,但在视角差异巨大或GPS不可靠的情况下仍具挑战性。我们提出了Wrivinder,一个零样本、几何驱动的框架,通过聚合多张地面照片重建一致的三维场景并将其与高空卫星影像对齐。Wrivinder结合了SfM重建、3D高斯泼溅、语义接地以及基于单目深度的度量线索,生成稳定的天顶视角渲染图,可直接与卫星上下文进行匹配,实现度量精确的相机地理定位。由于该任务缺乏合适的基准数据集,为支持系统性评估,我们还发布了MC-Sat,一个精心构建的数据集,将多视角地面图像与不同户外环境下的地理配准卫星图块进行关联。Wrivinder与MC-Sat共同为研究无需配对监督、以几何为中心的跨视角对齐任务提供了首个综合性基线及测试平台。在零样本实验中,Wrivinder在密集场景和大范围场景中均实现了低于30米的地理定位精度,凸显了基于几何的聚合方法对于鲁棒的地面到卫星定位的潜力。