With the ever-increasing volumes of the Earth observation data present in the archives of large programmes such as Copernicus, there is a growing need for efficient vector representations of the underlying raw data. The approach of extracting feature representations from pretrained deep neural networks is a powerful approach that can provide semantic abstractions of the input data. However, the way this is done for imagery archives containing geospatial data has not yet been defined. In this work, an extension is proposed to an existing community project, Major TOM, focused on the provision and standardization of open and free AI-ready datasets for Earth observation. Furthermore, four global and dense embedding datasets are released openly and for free along with the publication of this manuscript, resulting in the most comprehensive global open dataset of geospatial visual embeddings in terms of covered Earth's surface.
翻译:随着哥白尼等大型计划档案库中地球观测数据量的持续增长,对底层原始数据的高效向量表示的需求日益迫切。从预训练的深度神经网络中提取特征表示的方法是一种强大的技术,能够为输入数据提供语义抽象。然而,针对包含地理空间数据的影像档案库,此类特征提取的具体实现方式尚未明确界定。本研究对现有社区项目Major TOM进行了扩展,该项目致力于为地球观测提供并标准化开放免费的人工智能就绪数据集。此外,随本论文的发表,我们公开免费发布了四个全球密集嵌入数据集,就覆盖地球表面积而言,这构成了目前最全面的全球开放地理空间视觉嵌入数据集。