Aerial imagery analysis is critical for many research fields. However, obtaining frequent high-quality aerial images is not always accessible due to its high effort and cost requirements. One solution is to use the Ground-to-Aerial (G2A) technique to synthesize aerial images from easily collectible ground images. However, G2A is rarely studied, because of its challenges, including but not limited to, the drastic view changes, occlusion, and range of visibility. In this paper, we present a novel Geometric Preserving Ground-to-Aerial (G2A) image synthesis (GPG2A) model that can generate realistic aerial images from ground images. GPG2A consists of two stages. The first stage predicts the Bird's Eye View (BEV) segmentation (referred to as the BEV layout map) from the ground image. The second stage synthesizes the aerial image from the predicted BEV layout map and text descriptions of the ground image. To train our model, we present a new multi-modal cross-view dataset, namely VIGORv2 which is built upon VIGOR with newly collected aerial images, maps, and text descriptions. Our extensive experiments illustrate that GPG2A synthesizes better geometry-preserved aerial images than existing models. We also present two applications, data augmentation for cross-view geo-localization and sketch-based region search, to further verify the effectiveness of our GPG2A. The code and data will be publicly available.
翻译:航拍图像分析对众多研究领域至关重要。然而,由于获取高质量航拍图像需要高昂的人力与成本,频繁获取此类图像往往难以实现。一种解决方案是利用地面对航拍(G2A)技术,从易于采集的地面图像合成航拍图像。然而,由于视角剧烈变化、遮挡及可见范围等挑战,G2A技术的研究尚不充分。本文提出一种新颖的几何保持地面对航拍(GPG2A)图像合成模型,能够从地面图像生成逼真的航拍图像。GPG2A包含两个阶段:第一阶段从地面图像预测鸟瞰图(BEV)分割图(称为BEV布局图);第二阶段根据预测的BEV布局图与地面图像的文本描述合成航拍图像。为训练模型,我们构建了新的多模态跨视角数据集VIGORv2,该数据集基于VIGOR扩展,新增了航拍图像、地图及文本描述。大量实验表明,GPG2A合成的航拍图像在几何保持性上优于现有模型。我们还展示了两个应用——跨视角地理定位的数据增强与基于草图的区域搜索,以进一步验证GPG2A的有效性。代码与数据将公开提供。