Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation. However, the prevailing CNN-based approaches have shown limitations in building long-range dependencies and capturing interaction information between spectral features. This results in inadequate utilization of spectral information and artifacts after upsampling. To address this issue, we propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure. Specifically, we first introduce a robust and spectral-friendly similarity metric, \ie, the spectral correlation coefficient of the spectrum (SCC), to replace the original attention matrix and incorporates inductive biases into the model to facilitate training. Built upon it, we further utilize the kernelizable attention technique with theoretical support to form a novel efficient SCC-kernel-based self-attention (ESSA) and reduce attention computation to linear complexity. ESSA enlarges the receptive field for features after upsampling without bringing much computation and allows the model to effectively utilize spatial-spectral information from different scales, resulting in the generation of more natural high-resolution images. Without the need for pretraining on large-scale datasets, our experiments demonstrate ESSA's effectiveness in both visual quality and quantitative results.
翻译:单高光谱图像超分辨率(single-HSI-SR)旨在从低分辨率观测中恢复高分辨率高光谱图像。然而,当前基于CNN的方法在构建长距离依赖关系和捕获光谱特征间的交互信息方面存在局限性,导致光谱信息利用不充分以及上采样后产生伪影。为解决此问题,我们提出ESSAformer——一种嵌入ESSA注意力的Transformer网络,采用迭代细化结构用于单高光谱图像超分辨率。具体而言,我们首先引入一种鲁棒且对光谱友好的相似性度量——光谱相关系数(Spectral Correlation Coefficient, SCC),用以替代原始注意力矩阵,并赋予模型归纳偏置以促进训练。在此基础上,我们进一步利用具有理论支持的可核化注意力技术,形成一种新型高效SCC核化自注意力(ESSA),并将注意力计算复杂度降低至线性。ESSA在不显著增加计算量的情况下,扩大了上采样后特征的感受野,使模型能够有效利用不同尺度的空间-光谱信息,从而生成更自然的高分辨率图像。无需在大规模数据集上进行预训练,我们的实验在视觉质量和定量结果两方面均证明了ESSA的有效性。