CrossDiff: Exploring Self-Supervised Representation of Pansharpening via Cross-Predictive Diffusion Model

Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When taking original MS and PAN images as inputs, they always obtain sub-optimal results due to the scale variation. In this paper, we propose to explore the self-supervised representation of pansharpening by designing a cross-predictive diffusion model, named CrossDiff. It has two-stage training. In the first stage, we introduce a cross-predictive pretext task to pre-train the UNet structure based on conditional DDPM, while in the second stage, the encoders of the UNets are frozen to directly extract spatial and spectral features from PAN and MS, and only the fusion head is trained to adapt for pansharpening task. Extensive experiments show the effectiveness and superiority of the proposed model compared with state-of-the-art supervised and unsupervised methods. Besides, the cross-sensor experiments also verify the generalization ability of proposed self-supervised representation learners for other satellite's datasets. We will release our code for reproducibility.

翻译：全色（PAN）图像与对应多光谱（MS）图像的融合被称为全色锐化，其目标在于融合PAN的丰富空间细节与MS的光谱信息。由于缺乏高分辨率MS图像，现有深度学习方法通常遵循低分辨率训练、低分辨率与全分辨率共同测试的范式。当以原始MS和PAN图像作为输入时，因尺度变化问题，此类方法往往仅能获得次优结果。本文提出通过设计名为CrossDiff的交叉预测扩散模型，探索全色锐化的自监督表示。该模型采用两阶段训练：第一阶段引入交叉预测预训练任务，基于条件DDPM对UNet结构进行预训练；第二阶段冻结UNet编码器以直接提取PAN和MS的空间与光谱特征，仅训练融合头以适应全色锐化任务。大量实验表明，与当前最先进的监督与无监督方法相比，所提模型具有有效性与优越性。此外，跨传感器实验验证了所提自监督表示学习器对其他卫星数据集具有泛化能力。为保障可复现性，我们将公开代码。

相关内容

关注 0

多媒体系统（MS）期刊详细介绍了多媒体计算，通信，存储和应用的各个方面的创新研究思想，新兴技术，最新方法和工具。它包含理论，实验和调查文章。多媒体系统的覆盖范围包括：在计算机系统中集成数字视频和音频功能；多媒体信息编码和数据交换格式；数字多媒体的操作系统机制；数字视频和音频网络与通信；存储模型和结构；用于支持多媒体应用程序的方法、范式、工具和软件体系结构；多媒体应用程序和应用程序接口，以及多媒体终端系统架构。官网地址：http://dblp.uni-trier.de/db/journals/mms/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日