Aerial-to-ground image synthesis is an emerging and challenging problem that aims to synthesize a ground image from an aerial image. Due to the highly different layout and object representation between the aerial and ground images, existing approaches usually fail to transfer the components of the aerial scene into the ground scene. In this paper, we propose a novel framework to explore the challenges by imposing enhanced structural alignment and semantic awareness. We introduce a novel semantic-attentive feature transformation module that allows to reconstruct the complex geographic structures by aligning the aerial feature to the ground layout. Furthermore, we propose semantic-aware loss functions by leveraging a pre-trained segmentation network. The network is enforced to synthesize realistic objects across various classes by separately calculating losses for different classes and balancing them. Extensive experiments including comparisons with previous methods and ablation studies show the effectiveness of the proposed framework both qualitatively and quantitatively.
翻译:空对地图像合成是一个新兴且具有挑战性的问题,旨在从航空图像合成地面图像。由于航空图像与地面图像在布局和对象表示上存在显著差异,现有方法通常难以将航空场景中的组成元素迁移到地面场景中。本文提出了一种新框架,通过增强结构对齐与语义感知来探索这些挑战。我们引入了一种新颖的语义注意力特征变换模块,该模块通过将航空特征对齐到地面布局,从而重建复杂的地理结构。此外,我们利用预训练的分割网络提出了语义感知损失函数。通过分别计算不同类别的损失并进行平衡,该网络被强制要求合成各类别下的逼真对象。大量实验(包括与先前方法的对比及消融研究)定性和定量地证明了所提框架的有效性。