Using Global Land Cover Product as Prompt for Cropland Mapping via Visual Foundation Model

Data-driven deep learning methods have shown great potential in cropland mapping. However, due to multiple factors such as attributes of cropland (topography, climate, crop type) and imaging conditions (viewing angle, illumination, scale), croplands under different scenes demonstrate a great domain gap. This makes it difficult for models trained in the specific scenes to directly generalize to other scenes. A common way to handle this problem is through the "Pretrain+Fine-tuning" paradigm. Unfortunately, considering the variety of features of cropland that are affected by multiple factors, it is hardly to handle the complex domain gap between pre-trained data and target data using only sparse fine-tuned samples as general constraints. Moreover, as the number of model parameters grows, fine-tuning is no longer an easy and low-cost task. With the emergence of prompt learning via visual foundation models, the "Pretrain+Prompting" paradigm redesigns the optimization target by introducing individual prompts for each single sample. This simplifies the domain adaption from generic to specific scenes during model reasoning processes. Therefore, we introduce the "Pretrain+Prompting" paradigm to interpreting cropland scenes and design the auto-prompting (APT) method based on freely available global land cover product. It can achieve a fine-grained adaptation process from generic scenes to specialized cropland scenes without introducing additional label costs. To our best knowledge, this work pioneers the exploration of the domain adaption problems for cropland mapping under prompt learning perspectives. Our experiments using two sub-meter cropland datasets from southern and northern China demonstrated that the proposed method via visual foundation models outperforms traditional supervised learning and fine-tuning approaches in the field of remote sensing.

翻译：数据驱动的深度学习方法在农田制图中展现出巨大潜力。然而，受农田属性（地形、气候、作物类型）及成像条件（视角、光照、尺度）等多重因素影响，不同场景下的农田存在显著域差异。这导致在特定场景训练的模型难以直接泛化至其他场景。常规解决方案采用"预训练+微调"范式，但鉴于受多重因素影响的农田特征多样性，仅以稀疏微调样本作为通用约束难以应对预训练数据与目标数据间的复杂域差异。随着模型参数规模增长，微调已非便捷低成本的解决方案。基于视觉基础模型的提示学习的出现，促使"预训练+提示"范式通过为每个样本引入独立提示重构优化目标，简化模型推理过程中从通用到特定场景的域适应过程。为此，我们将"预训练+提示"范式引入农田场景解译，并基于公开全球土地覆盖产品设计自动提示（APT）方法。该方法可在不增加额外标注成本的前提下，实现从通用场景到专业化农田场景的细粒度适应过程。据我们所知，本研究首次从提示学习角度探索农田制图的域适应问题。采用中国南方与北方两个亚米级农田数据集的实验表明，基于视觉基础模型的所提方法在遥感领域优于传统监督学习与微调方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日