The recent advancements in generative AI techniques, which have significantly increased the online dissemination of altered images and videos, have raised serious concerns about the credibility of digital media available on the Internet and distributed through information channels and social networks. This issue particularly affects domains that rely heavily on trustworthy data, such as journalism, forensic analysis, and Earth observation. To address these concerns, the ability to geolocate a non-geo-tagged ground-view image without external information, such as GPS coordinates, has become increasingly critical. This study tackles the challenge of linking a ground-view image, potentially exhibiting varying fields of view (FoV), to its corresponding satellite image without the aid of GPS data. To achieve this, we propose a novel four-stream Siamese-like architecture, the Quadruple Semantic Align Net (SAN-QUAD), which extends previous state-of-the-art (SOTA) approaches by leveraging semantic segmentation applied to both ground and satellite imagery. Experimental results on a subset of the CVUSA dataset demonstrate significant improvements of up to 9.8% over prior methods across various FoV settings.
翻译:近年来,生成式人工智能技术的进步显著增加了网络上篡改图像和视频的传播,引发了人们对互联网及通过信息渠道和社交网络传播的数字媒体可信度的严重担忧。这一问题尤其影响严重依赖可信数据的领域,如新闻业、法证分析和地球观测。为解决这些担忧,在不依赖外部信息(如GPS坐标)的情况下对未添加地理标签的地面视角图像进行地理定位的能力变得日益关键。本研究致力于应对以下挑战:在无GPS数据辅助下,将可能具有不同视场(FoV)的地面视角图像与其对应的卫星图像进行关联。为此,我们提出了一种新颖的四流孪生网络架构——四重语义对齐网络(SAN-QUAD),该架构通过对地面和卫星图像同时应用语义分割,扩展了先前最先进(SOTA)方法。在CVUSA数据集子集上的实验结果表明,相较于现有方法,该模型在各种FoV设置下取得了高达9.8%的显著性能提升。