Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper, our study uncovers ViTs' unique advantage in discerning large-scale changes, a capability where CNNs fall short. Capitalizing on this insight, we introduce ChangeViT, a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes. This framework is supplemented by a detail-capture module that generates detailed spatial features and a feature injector that efficiently integrates fine-grained spatial information into high-level semantic learning. The feature integration ensures that ChangeViT excels in both detecting large-scale changes and capturing fine-grained details, providing comprehensive change detection across diverse scales. Without bells and whistles, ChangeViT achieves state-of-the-art performance on three popular high-resolution datasets (i.e., LEVIR-CD, WHU-CD, and CLCD) and one low-resolution dataset (i.e., OSCD), which underscores the unleashed potential of plain ViTs for change detection. Furthermore, thorough quantitative and qualitative analyses validate the efficacy of the introduced modules, solidifying the effectiveness of our approach. The source code is available at https://github.com/zhuduowang/ChangeViT.
翻译:遥感图像中的变化检测对于追踪地球表面环境变化至关重要。尽管视觉Transformer(ViT)作为骨干网络在众多计算机视觉应用中取得了成功,但在变化检测领域仍未得到充分利用,而卷积神经网络(CNN)凭借其强大的特征提取能力持续占据主导地位。本文研究发现,ViT在识别大尺度变化方面具有独特优势,而这正是CNN的不足所在。基于这一洞见,我们提出了ChangeViT框架,该框架采用朴素ViT骨干网络以提升大尺度变化的检测性能。该框架辅以一个细节捕捉模块(用于生成详细空间特征)和一个特征注入器(用于将细粒度空间信息高效整合到高层语义学习中)。这种特征集成机制确保ChangeViT既能出色检测大尺度变化,又能捕捉细粒度细节,实现跨多尺度范围的全面变化检测。在未使用复杂技巧的情况下,ChangeViT在三个主流高分辨率数据集(即LEVIR-CD、WHU-CD和CLCD)及一个低分辨率数据集(即OSCD)上均达到了最先进的性能,这印证了朴素ViT在变化检测领域被释放的潜力。此外,详尽的定量与定性分析验证了所提出模块的有效性,进一步巩固了我们方法的优越性。源代码公开于https://github.com/zhuduowang/ChangeViT。