OSDM-MReg：基于单步扩散模型的多模态图像配准 (OSDM-MReg: Multimodal Image Registration based One Step Diffusion Model)

Multimodal remote sensing image registration aligns images from different sensors for data fusion and analysis. However, current methods often fail to extract modality-invariant features when aligning image pairs with large nonlinear radiometric differences. To address this issues, we propose OSDM-MReg, a novel multimodal image registration framework based image-to-image translation to eliminate the gap of multimodal images. Firstly, we propose a novel one-step unaligned target-guided conditional denoising diffusion probabilistic models(UTGOS-CDDPM)to translate multimodal images into a unified domain. In the inference stage, traditional conditional DDPM generate translated source image by a large number of iterations, which severely slows down the image registration task. To address this issues, we use the unaligned traget image as a condition to promote the generation of low-frequency features of the translated source image. Furthermore, during the training stage, we add the inverse process of directly predicting the translated image to ensure that the translated source image can be generated in one step during the testing stage. Additionally, to supervised the detail features of translated source image, we propose a new perceptual loss that focuses on the high-frequency feature differences between the translated and ground-truth images. Finally, a multimodal multiscale image registration network (MM-Reg) fuse the multimodal feature of the unimodal images and multimodal images by proposed multimodal feature fusion strategy. Experiments demonstrate superior accuracy and efficiency across various multimodal registration tasks, particularly for SAR-optical image pairs.

翻译：多模态遥感图像配准旨在对齐来自不同传感器的图像以实现数据融合与分析。然而，现有方法在对齐具有显著非线性辐射差异的图像对时，往往难以提取模态不变特征。为解决这一问题，本文提出OSDM-MReg——一种基于图像到图像转换的新型多模态图像配准框架，以消除多模态图像间的差异。首先，我们提出一种新颖的单步非对齐目标引导条件去噪扩散概率模型（UTGOS-CDDPM），将多模态图像转换至统一域。在推理阶段，传统条件DDPM需通过大量迭代生成转换后的源图像，严重拖慢配准任务效率。针对此问题，我们利用非对齐目标图像作为条件，促进转换后源图像低频特征的生成。此外，在训练阶段引入直接预测转换图像的反向过程，确保测试阶段能够单步生成转换后的源图像。同时，为监督转换后源图像的细节特征，我们提出一种新的感知损失函数，专注于转换图像与真实图像间的高频特征差异。最后，通过多模态多尺度图像配准网络（MM-Reg）及所提出的多模态特征融合策略，融合单模态图像与多模态图像的多模态特征。实验表明，该方法在多种多模态配准任务中均表现出优异的精度与效率，尤其针对SAR-光学图像对。