Pre-trained Foundation Models (PFMs) have ushered in a paradigm-shift in Artificial Intelligence, due to their ability to learn general-purpose representations that can be readily employed in a wide range of downstream tasks. While PFMs have been successfully adopted in various fields such as Natural Language Processing and Computer Vision, their capacity in handling geospatial data and answering urban questions remains limited. This can be attributed to the intrinsic heterogeneity of geospatial data, which encompasses different data types, including points, segments and regions, as well as multiple information modalities, such as a spatial position, visual characteristics and textual annotations. The proliferation of Volunteered Geographic Information initiatives, and the ever-increasing availability of open geospatial data sources, like OpenStreetMap, which is freely accessible globally, unveil a promising opportunity to bridge this gap. In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OSM, and produces multimodal representations of entities of different types, incorporating spatial, visual, and textual information. We analyse the entity representations generated using our foundation models from a qualitative perspective, and conduct quantitative experiments on road, building, and region-level downstream tasks. We compare its results to algorithms tailored specifically for the respective applications. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.
翻译:预训练基础模型(PFMs)因其能够学习可广泛应用于各类下游任务的通用表示,引发了人工智能领域的范式转变。尽管PFMs已在自然语言处理和计算机视觉等领域取得成功,但其在处理地理空间数据和解答城市问题方面的能力仍十分有限。这归因于地理空间数据固有的异质性——既包含点、线段和区域等不同数据类型,又涉及空间位置、视觉特征和文本标注等多种信息模态。随着志愿地理信息项目的蓬勃发展,以及OpenStreetMap等免费开放地理空间数据源的日益普及,为弥合这一差距提供了重要机遇。本文提出CityFM,这是一个自监督框架,可在选定地理区域(如城市)内训练基础模型。CityFM仅依赖OSM的开放数据,为不同类型实体生成融合空间、视觉和文本信息的多模态表示。我们从定性角度分析基础模型生成的实体表示,并在道路、建筑和区域级下游任务中开展定量实验,将其结果与专为特定应用设计的算法进行比较。所有实验表明,CityFM的性能均优于或持平于基线方法。