Fusion-based hyperspectral image (HSI) super-resolution has become increasingly prevalent for its capability to integrate high-frequency spatial information from the paired high-resolution (HR) RGB reference image. However, most of the existing methods either heavily rely on the accurate alignment between low-resolution (LR) HSIs and RGB images, or can only deal with simulated unaligned RGB images generated by rigid geometric transformations, which weakens their effectiveness for real scenes. In this paper, we explore the fusion-based HSI super-resolution with real RGB reference images that have both rigid and non-rigid misalignments. To properly address the limitations of existing methods for unaligned reference images, we propose an HSI fusion network with heterogenous feature extractions, multi-stage feature alignments, and attentive feature fusion. Specifically, our network first transforms the input HSI and RGB images into two sets of multi-scale features with an HSI encoder and an RGB encoder, respectively. The features of RGB reference images are then processed by a multi-stage alignment module to explicitly align the features of RGB reference with the LR HSI. Finally, the aligned features of RGB reference are further adjusted by an adaptive attention module to focus more on discriminative regions before sending them to the fusion decoder to generate the reconstructed HR HSI. Additionally, we collect a real-world HSI fusion dataset, consisting of paired HSI and unaligned RGB reference, to support the evaluation of the proposed model for real scenes. Extensive experiments are conducted on both simulated and our real-world datasets, and it shows that our method obtains a clear improvement over existing single-image and fusion-based super-resolution methods on quantitative assessment as well as visual comparison.
翻译:融合式高光谱图像(HSI)超分辨率因能从配对的高分辨率(HR)RGB参考图像中整合高频空间信息而日益普及。然而,现有方法大多要么严重依赖低分辨率(LR)高光谱图像与RGB图像之间的精确对齐,要么仅能处理通过刚性几何变换生成的模拟非对齐RGB图像,这削弱了其在真实场景中的有效性。本文探讨了基于融合的高光谱图像超分辨率方法,其RGB参考图像同时存在刚性和非刚性失配。为妥善解决现有方法在处理非对齐参考图像时的局限性,我们提出了一种具有异质特征提取、多阶段特征对齐和注意力特征融合的高光谱图像融合网络。具体而言,该网络首先通过高光谱编码器和RGB编码器将输入的高光谱图像和RGB图像分别转换为两组多尺度特征。随后,RGB参考图像的特征经多阶段对齐模块处理,以显式地将RGB参考特征与低分辨率高光谱图像对齐。最后,对齐后的RGB参考特征经自适应注意力模块进一步调整,以在送入融合解码器生成重建高分辨率高光谱图像前更关注判别性区域。此外,我们还收集了一个由配对高光谱图像与非对齐RGB参考组成的真实世界高光谱融合数据集,以支持所提模型在真实场景中的评估。在模拟和真实数据集上进行了大量实验,结果表明,该方法在定量评估和视觉比较上均较现有单图像和基于融合的超分辨率方法获得显著提升。