Pre-trained Foundation Models (PFMs) have ushered in a paradigm-shift in Artificial Intelligence, due to their ability to learn general-purpose representations that can be readily employed in a wide range of downstream tasks. While PFMs have been successfully adopted in various fields such as Natural Language Processing and Computer Vision, their capacity in handling geospatial data and answering urban questions remains limited. This can be attributed to the intrinsic heterogeneity of geospatial data, which encompasses different data types, including points, segments and regions, as well as multiple information modalities, such as a spatial position, visual characteristics and textual annotations. The proliferation of Volunteered Geographic Information initiatives, and the ever-increasing availability of open geospatial data sources, like OpenStreetMap, which is freely accessible globally, unveil a promising opportunity to bridge this gap. In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OSM, and produces multimodal representations of entities of different types, incorporating spatial, visual, and textual information. We analyse the entity representations generated using our foundation models from a qualitative perspective, and conduct quantitative experiments on road, building, and region-level downstream tasks. We compare its results to algorithms tailored specifically for the respective applications. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.
翻译:预训练基础模型(PFMs)因其能够学习通用表征,并可轻松应用于广泛的下游任务,从而引发了人工智能领域的范式转变。尽管PFMs已在自然语言处理和计算机视觉等多个领域成功应用,但其在处理地理空间数据和回答城市问题方面的能力仍显不足。这主要归因于地理空间数据固有的异质性,其涵盖点、线段和区域等不同数据类型,以及空间位置、视觉特征和文本标注等多重信息模态。志愿者地理信息计划的普及,以及如OpenStreetMap这类全球免费开放的地理空间数据源的日益可用性,为弥合这一差距提供了重要机遇。本文提出CityFM,一种在选定地理区域(如城市)内训练基础模型的自监督框架。CityFM仅依赖OSM的开放数据,并生成融合空间、视觉和文本信息的多模态实体表征。我们从定性角度分析了基于本基础模型生成的实体表征,并在道路、建筑和区域层级的下游任务上进行了定量实验。我们将其结果与针对各应用专门设计的算法进行比较。在所有实验中,CityFM均取得优于或与基线模型相当的性能表现。