Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer

Optical high-resolution imagery and OpenStreetMap (OSM) data are two important data sources for land-cover change detection. Previous studies in these two data sources focus on utilizing the information in OSM data to aid the change detection on multi-temporal optical high-resolution images. This paper pioneers the direct detection of land-cover changes utilizing paired OSM data and optical imagery, thereby broadening the horizons of change detection tasks to encompass more dynamic earth observations. To this end, we propose an object-guided Transformer (ObjFormer) architecture by naturally combining the prevalent object-based image analysis (OBIA) technique with the advanced vision Transformer architecture. The introduction of OBIA can significantly reduce the computational overhead and memory burden in the self-attention module. Specifically, the proposed ObjFormer has a hierarchical pseudo-siamese encoder consisting of object-guided self-attention modules that extract representative features of different levels from OSM data and optical images; a decoder consisting of object-guided cross-attention modules can progressively recover the land-cover changes from the extracted heterogeneous features. In addition to the basic supervised binary change detection task, this paper raises a new semi-supervised semantic change detection task that does not require any manually annotated land-cover labels of optical images to train semantic change detectors. Two lightweight semantic decoders are added to ObjFormer to accomplish this task efficiently. A converse cross-entropy loss is designed to fully utilize the negative samples, thereby contributing to the great performance improvement in this task. The first large-scale benchmark dataset containing 1,287 map-image pairs (1024$\times$ 1024 pixels for each sample) covering 40 regions on six continents ...(see the manuscript for the full abstract)

翻译：光学高分辨率影像与OpenStreetMap（OSM）数据是地物覆盖变化检测的两类重要数据源。现有针对这两类数据源的研究主要聚焦于利用OSM数据中的信息辅助多时相光学高分辨率影像的变化检测。本文开创性地直接利用配对的OSM数据与光学影像检测地物覆盖变化，从而将变化检测任务的应用范围拓展至更具动态性的地球观测领域。为此，我们通过自然融合主流的面向对象影像分析（OBIA）技术与先进的视觉Transformer架构，提出了一种对象引导的Transformer（ObjFormer）架构。引入OBIA可显著降低自注意力模块中的计算开销与内存负担。具体而言，所提出的ObjFormer包含一个层级化的伪孪生编码器，该编码器由对象引导的自注意力模块构成，可从OSM数据和光学影像中提取不同层次的代表性特征；一个由对象引导的交叉注意力模块构成的解码器，能够从提取的异质特征中逐步恢复地物覆盖变化。除基本的监督二值变化检测任务外，本文还提出了一项新型的半监督语义变化检测任务，该任务无需任何人工标注的光学影像地物覆盖标签即可训练语义变化检测器。ObjFormer中新增了两个轻量级语义解码器以高效完成该任务。设计了一种反向交叉熵损失函数以充分利用负样本，从而显著提升该任务的性能。首个大规模基准数据集包含覆盖六大洲40个区域的1,287对地图-影像样本（每个样本1024×1024像素）……（详见原稿完整摘要）