Recent advances in image-based satellite 3D reconstruction have progressed along two complementary directions. On one hand, multi-date approaches using NeRF or Gaussian-splatting jointly model appearance and geometry across many acquisitions, achieving accurate reconstructions on opportunistic imagery with numerous observations. On the other hand, classical stereoscopic reconstruction pipelines deliver robust and scalable results for simultaneous or quasi-simultaneous image pairs. However, when the two images are captured months apart, strong seasonal, illumination, and shadow changes violate standard stereoscopic assumptions, causing existing pipelines to fail. This work presents the first Diachronic Stereo Matching method for satellite imagery, enabling reliable 3D reconstruction from temporally distant pairs. Two advances make this possible: (1) fine-tuning a state-of-the-art deep stereo network that leverages monocular depth priors, and (2) exposing it to a dataset specifically curated to include a diverse set of diachronic image pairs. In particular, we start from a pretrained MonSter model, trained initially on a mix of synthetic and real datasets such as SceneFlow and KITTI, and fine-tune it on a set of stereo pairs derived from the DFC2019 remote sensing challenge. This dataset contains both synchronic and diachronic pairs under diverse seasonal and illumination conditions. Experiments on multi-date WorldView-3 imagery demonstrate that our approach consistently surpasses classical pipelines and unadapted deep stereo models on both synchronic and diachronic settings. Fine-tuning on temporally diverse images, together with monocular priors, proves essential for enabling 3D reconstruction from previously incompatible acquisition dates. Left image (winter) Right image (autumn) DSM geometry Ours (1.23 m) Zero-shot (3.99 m) LiDAR GT Figure 1. Output geometry for a winter-autumn image pair from Omaha (OMA 331 test scene). Our method recovers accurate geometry despite the diachronic nature of the pair, exhibiting strong appearance changes, which cause existing zero-shot methods to fail. Missing values due to perspective shown in black. Mean altitude error in parentheses; lower is better.
翻译:近年来,基于影像的卫星三维重建技术沿着两个互补的方向取得了进展。一方面,采用NeRF或高斯泼溅的多时相方法能够联合建模多次采集过程中的外观与几何信息,从而在具有大量观测数据的机遇影像上实现精确重建。另一方面,经典的立体视觉重建流程为同时或准同时获取的影像对提供了稳健且可扩展的结果。然而,当两幅影像的获取时间相隔数月时,强烈的季节性变化、光照变化和阴影变化会违背标准立体视觉的假设,导致现有流程失效。本研究提出了首个适用于卫星影像的历时立体匹配方法,使得从时间相隔较远的影像对中进行可靠的三维重建成为可能。两项关键进展使之得以实现:(1) 对一种利用单目深度先验的先进深度立体网络进行微调;(2) 使其在一个专门策划、包含多样化历时影像对的数据集上进行训练。具体而言,我们从预训练的MonSter模型出发,该模型最初在SceneFlow和KITTI等合成与真实数据集混合训练而成,并在源自DFC2019遥感挑战赛的一组立体像对上对其进行微调。该数据集包含了不同季节和光照条件下的同步与历时像对。在多时相WorldView-3影像上的实验表明,无论在同步还是历时场景下,我们的方法均持续优于经典流程和未经适配的深度立体模型。在时间多样化的影像上进行微调,并结合单目先验,被证明对于实现从先前不兼容的获取日期进行三维重建至关重要。左图(冬季) 右图(秋季) DSM几何 我们的方法(1.23米) 零样本方法(3.99米) LiDAR真值 图1. 奥马哈(OMA 331测试场景)冬季-秋季影像对的输出几何结果。尽管该像对具有历时性并表现出强烈的外观变化(这导致现有零样本方法失效),我们的方法仍能恢复精确的几何信息。因透视导致的缺失值以黑色显示。括号内为平均高程误差;数值越低越好。