Fine-grained remote sensing image segmentation is essential for accurately identifying detailed objects in remote sensing images. Recently, vision transformer models (VTM) pretrained on large-scale datasets have shown strong zero-shot generalization, indicating that they have learned the general knowledge of object understanding. We introduce a novel end-to-end learning paradigm combining knowledge guidance with domain refinement to enhance performance. We present two key components: the Feature Alignment Module (FAM) and the Feature Modulation Module (FMM). FAM aligns features from a CNN-based backbone with those from the pretrained VTM's encoder using channel transformation and spatial interpolation, and transfers knowledge via KL divergence and L2 normalization constraint. FMM further adapts the knowledge to the specific domain to address domain shift. We also introduce a fine-grained grass segmentation dataset and demonstrate, through experiments on two datasets, that our method achieves a significant improvement of 2.57 mIoU on the grass dataset and 3.73 mIoU on the cloud dataset. The results highlight the potential of combining knowledge transfer and domain adaptation to overcome domain-related challenges and data limitations. The project page is available at https://xavierjiezou.github.io/KTDA/.
翻译:细粒度遥感图像分割对于精确识别遥感图像中的细节对象至关重要。近年来,在大规模数据集上预训练的视觉Transformer模型(VTM)展现出强大的零样本泛化能力,表明其已习得对象理解的通用知识。我们提出了一种结合知识引导与域精化的新型端到端学习范式以提升性能。该方法包含两个核心组件:特征对齐模块(FAM)与特征调制模块(FMM)。FAM通过通道变换与空间插值,将基于CNN的主干网络特征与预训练VTM编码器特征进行对齐,并借助KL散度与L2归一化约束实现知识迁移。FMM进一步将知识适配至特定领域以应对域偏移问题。我们还构建了一个细粒度草地分割数据集,并在两个数据集上的实验表明,本方法在草地数据集上实现了2.57 mIoU的显著提升,在云数据集上提升了3.73 mIoU。这些结果凸显了结合知识迁移与域适应技术在克服领域相关挑战与数据局限方面的潜力。项目页面详见 https://xavierjiezou.github.io/KTDA/。