Multi-Expert Learning Framework with the State Space Model for Optical and SAR Image Registration

Optical and Synthetic Aperture Radar (SAR) image registration is crucial for multi-modal image fusion and applications. However, several challenges limit the performance of existing deep learning-based methods in cross-modal image registration: (i) significant nonlinear radiometric variations between optical and SAR images affect the shared feature learning and matching; (ii) limited textures in images hinder discriminative feature extraction; (iii) the local receptive field of Convolutional Neural Networks (CNNs) restricts the learning of contextual information, while the Transformer can capture long-range global features but with high computational complexity. To address these issues, this paper proposes a multi-expert learning framework with the State Space Model (ME-SSM) for optical and SAR image registration. Firstly, to improve the registration performance with limited textures, ME-SSM constructs a multi-expert learning framework to capture shared features from multi-modal images. Specifically, it extracts features from various transformations of the input image and employs a learnable soft router to dynamically fuse these features, thereby enriching feature representations and improving registration performance. Secondly, ME-SSM introduces a state space model, Mamba, for feature extraction, which employs a multi-directional cross-scanning strategy to efficiently capture global contextual relationships with linear complexity. ME-SSM can expand the receptive field, enhance image registration accuracy, and avoid incurring high computational costs. Additionally, ME-SSM uses a multi-level feature aggregation (MFA) module to enhance the multi-scale feature fusion and interaction. Extensive experiments have demonstrated the effectiveness and advantages of our proposed ME-SSM on optical and SAR image registration.

翻译：光学与合成孔径雷达(SAR)图像配准对于多模态图像融合与应用至关重要。然而，现有基于深度学习的跨模态图像配准方法面临若干挑战：(i)光学与SAR图像间显著的非线性辐射差异影响共享特征学习与匹配；(ii)图像中有限的纹理信息阻碍判别性特征提取；(iii)卷积神经网络(CNN)的局部感受野限制了上下文信息学习，而Transformer虽能捕获长程全局特征却具有高计算复杂度。为解决这些问题，本文提出一种基于状态空间模型的多专家学习框架(ME-SSM)用于光学与SAR图像配准。首先，为提升纹理有限条件下的配准性能，ME-SSM构建多专家学习框架以捕获多模态图像的共享特征。具体而言，该框架从输入图像的多重变换中提取特征，并采用可学习的软路由机制动态融合这些特征，从而丰富特征表示并提升配准性能。其次，ME-SSM引入状态空间模型Mamba进行特征提取，该模型采用多向交叉扫描策略以线性复杂度高效捕获全局上下文关系。ME-SSM能够扩展感受野，提升图像配准精度，同时避免产生高计算成本。此外，ME-SSM通过多级特征聚合(MFA)模块增强多尺度特征的融合与交互。大量实验验证了所提ME-SSM在光学与SAR图像配准任务中的有效性与优势。