The lack of quality labeled data is one of the main bottlenecks for training Deep Learning models. As the task increases in complexity, there is a higher penalty for overfitting and unstable learning. The typical paradigm employed today is Self-Supervised learning, where the model attempts to learn from a large corpus of unstructured and unlabeled data and then transfer that knowledge to the required task. Some notable examples of self-supervision in other modalities are BERT for Large Language Models, Wav2Vec for Speech Recognition, and the Masked AutoEncoder for Vision, which all utilize Transformers to solve a masked prediction task. GeoAI is uniquely poised to take advantage of the self-supervised methodology due to the decades of data collected, little of which is precisely and dependably annotated. Our goal is to extract building and road segmentations from Digital Elevation Models (DEM) that provide a detailed topography of the earths surface. The proposed architecture is the Masked Autoencoder pre-trained on ImageNet (with the limitation that there is a large domain discrepancy between ImageNet and DEM) with an UperNet Head for decoding segmentations. We tested this model with 450 and 50 training images only, utilizing roughly 5% and 0.5% of the original data respectively. On the building segmentation task, this model obtains an 82.1% Intersection over Union (IoU) with 450 Images and 69.1% IoU with only 50 images. On the more challenging road detection task the model obtains an 82.7% IoU with 450 images and 73.2% IoU with only 50 images. Any hand-labeled dataset made today about the earths surface will be immediately obsolete due to the constantly changing nature of the landscape. This motivates the clear necessity for data-efficient learners that can be used for a wide variety of downstream tasks.
翻译:高质量标注数据的匮乏是深度学习模型训练的主要瓶颈之一。随着任务复杂度提升,过拟合与不稳定学习带来的代价愈发显著。当前的主流范式是自监督学习——模型先通过海量非结构化无标注数据学习表征,再将知识迁移至目标任务。其他模态中的典型实例包括大型语言模型的BERT、语音识别的Wav2Vec以及视觉领域的掩码自编码器,这些模型均采用Transformer架构解决掩码预测任务。地理空间人工智能(GeoAI)在自监督方法上具有独特优势,因其积累数十年观测数据却鲜有精准可靠的人工标注。本研究的目标是从提供地球表面精细地形信息的数字高程模型(DEM)中提取建筑物与道路的分割结果。所提架构采用在ImageNet上预训练的掩码自编码器(尽管ImageNet与DEM存在显著领域差异),并叠加UperNet解码头实现分割。我们仅用450张和50张训练图像(分别约占原始数据的5%和0.5%)进行测试。在建筑物分割任务中,该模型在450张图像下获得82.1%的交并比(IoU),在50张图像下仍达69.1% IoU;更具挑战性的道路检测任务中,模型在450张图像下取得82.7% IoU,即使在50张图像下也达到73.2% IoU。由于地表景观持续变迁,当前任何人工标注的地球表面数据集都会迅速过时,这充分凸显了开发数据高效型学习器以服务多样化下游任务的必要性。