With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. Firstly, CL often mistakenly identifies geographically adjacent samples with similar semantic content as negative pairs, leading to confusion during model training. Secondly, as an instance-level discriminative task, it tends to neglect the essential fine-grained features and complex details inherent in unstructured RSIs. To overcome these obstacles, we introduce SwiMDiff, a novel self-supervised pre-training framework designed for RSIs. SwiMDiff employs a scene-wide matching approach that effectively recalibrates labels to recognize data from the same scene as false negatives. This adjustment makes CL more applicable to the nuances of remote sensing. Additionally, SwiMDiff seamlessly integrates CL with a diffusion model. Through the implementation of pixel-level diffusion constraints, we enhance the encoder's ability to capture both the global semantic information and the fine-grained features of the images more comprehensively. Our proposed framework significantly enriches the information available for downstream tasks in remote sensing. Demonstrating exceptional performance in change detection and land-cover classification tasks, SwiMDiff proves its substantial utility and value in the field of remote sensing.
翻译:随着航空航天技术的飞速发展,无标签遥感图像数据量急剧增长。通过自监督学习有效利用这些数据在遥感领域至关重要。然而,当前方法(尤其是对比学习这一主流的自监督学习方法)在该领域面临特定挑战:首先,对比学习常将地理邻近且语义相似的样本误判为负样本对,导致模型训练混淆;其次,作为实例级判别任务,它容易忽略非结构化遥感图像中固有的关键细粒度特征与复杂细节。为克服这些障碍,我们提出SwiMDiff——一种专为遥感图像设计的新型自监督预训练框架。SwiMDiff采用场景级匹配方法,有效重新校准标签,将同一场景的数据识别为假阴性样本,使对比学习更适应遥感场景的细微特征。此外,SwiMDiff将对比学习与扩散模型无缝集成,通过实施像素级扩散约束,增强编码器更全面地捕获图像全局语义信息与细粒度特征的能力。该框架显著丰富了遥感下游任务可用信息。在变化检测和土地覆盖分类任务中,SwiMDiff展现出卓越性能,验证了其在遥感领域的实用价值与重要贡献。