Recently, diffusion models bring novel insights for Pan-sharpening and notably boost fusion precision. However, most existing models perform diffusion in the pixel space and train distinct models for different multispectral (MS) imagery, suffering from high latency and sensor-specific limitations. In this paper, we present SALAD-Pan, a sensor-agnostic latent space diffusion method for efficient pansharpening. Specifically, SALAD-Pan trains a band-wise single-channel VAE to encode high-resolution multispectral (HRMS) into compact latent representations, supporting MS images with various channel counts and establishing a basis for acceleration. Then spectral physical properties, along with PAN and MS images, are injected into the diffusion backbone through unidirectional and bidirectional interactive control structures respectively, achieving high-precision fusion in the diffusion process. Finally, a lightweight cross-spectral attention module is added to the central layer of diffusion model, reinforcing spectral connections to boost spectral consistency and further elevate fusion precision. Experimental results on GaoFen-2, QuickBird, and WorldView-3 demonstrate that SALAD-Pan outperforms state-of-the-art diffusion-based methods across all three datasets, attains a 2-3x inference speedup, and exhibits robust zero-shot (cross-sensor) capability.
翻译:近年来,扩散模型为全色锐化带来了新的思路,并显著提升了融合精度。然而,现有模型大多在像素空间进行扩散,且需为不同的多光谱影像训练独立模型,存在高延迟和传感器特定限制的问题。本文提出SALAD-Pan,一种传感器无关的潜在空间扩散方法,用于高效的全色锐化。具体而言,SALAD-Pan训练一个波段式单通道变分自编码器,将高分辨率多光谱影像编码为紧凑的潜在表示,支持不同通道数的多光谱图像,并为加速奠定基础。随后,光谱物理特性与全色及多光谱图像分别通过单向和双向交互控制结构注入扩散主干网络,在扩散过程中实现高精度融合。最后,在扩散模型的核心层引入轻量级跨光谱注意力模块,以增强光谱关联性,提升光谱一致性并进一步提高融合精度。在高分二号、QuickBird和WorldView-3数据集上的实验结果表明,SALAD-Pan在三个数据集上均优于当前最先进的基于扩散的方法,推理速度提升2-3倍,并展现出鲁棒的零样本(跨传感器)能力。