This paper presents a novel approach for cross-view synthesis aimed at generating plausible ground-level images from corresponding satellite imagery or vice versa. We refer to these tasks as satellite-to-ground (Sat2Grd) and ground-to-satellite (Grd2Sat) synthesis, respectively. Unlike previous works that typically focus on one-to-one generation, producing a single output image from a single input image, our approach acknowledges the inherent one-to-many nature of the problem. This recognition stems from the challenges posed by differences in illumination, weather conditions, and occlusions between the two views. To effectively model this uncertainty, we leverage recent advancements in diffusion models. Specifically, we exploit random Gaussian noise to represent the diverse possibilities learnt from the target view data. We introduce a Geometry-guided Cross-view Condition (GCC) strategy to establish explicit geometric correspondences between satellite and street-view features. This enables us to resolve the geometry ambiguity introduced by camera pose between image pairs, boosting the performance of cross-view image synthesis. Through extensive quantitative and qualitative analyses on three benchmark cross-view datasets, we demonstrate the superiority of our proposed geometry-guided cross-view condition over baseline methods, including recent state-of-the-art approaches in cross-view image synthesis. Our method generates images of higher quality, fidelity, and diversity than other state-of-the-art approaches.
翻译:本文提出了一种新颖的跨视角合成方法,旨在从对应的卫星图像生成合理的地面图像,或反之。我们将这些任务分别称为卫星到地面(Sat2Grd)和地面到卫星(Grd2Sat)合成。与以往通常专注于一对一生成(即从单个输入图像生成单个输出图像)的工作不同,我们的方法承认了该问题固有的一对多特性。这一认识源于两种视角之间在光照、天气条件和遮挡方面差异所带来的挑战。为了有效建模这种不确定性,我们利用了扩散模型的最新进展。具体而言,我们利用随机高斯噪声来表示从目标视角数据学习到的多种可能性。我们引入了一种几何引导的跨视角条件(GCC)策略,以建立卫星和街景特征之间明确的几何对应关系。这使我们能够解决由图像对间相机姿态引入的几何模糊性,从而提升跨视角图像合成的性能。通过在三个基准跨视角数据集上进行广泛的定量和定性分析,我们证明了所提出的几何引导跨视角条件策略相对于基线方法(包括近期跨视角图像合成中最先进的方法)的优越性。我们的方法生成的图像在质量、保真度和多样性方面均优于其他最先进的方法。