Due to the limitations of current optical and sensor technologies and the high cost of updating them, the spectral and spatial resolution of satellites may not always meet desired requirements. For these reasons, Remote-Sensing Single-Image Super-Resolution (RS-SISR) techniques have gained significant interest. In this paper, we propose Swin2-MoSE model, an enhanced version of Swin2SR. Our model introduces MoE-SM, an enhanced Mixture-of-Experts (MoE) to replace the Feed-Forward inside all Transformer block. MoE-SM is designed with Smart-Merger, and new layer for merging the output of individual experts, and with a new way to split the work between experts, defining a new per-example strategy instead of the commonly used per-token one. Furthermore, we analyze how positional encodings interact with each other, demonstrating that per-channel bias and per-head bias can positively cooperate. Finally, we propose to use a combination of Normalized-Cross-Correlation (NCC) and Structural Similarity Index Measure (SSIM) losses, to avoid typical MSE loss limitations. Experimental results demonstrate that Swin2-MoSE outperforms SOTA by up to 0.377 ~ 0.958 dB (PSNR) on task of 2x, 3x and 4x resolution-upscaling (Sen2Venus and OLI2MSI datasets). We show the efficacy of Swin2-MoSE, applying it to a semantic segmentation task (SeasoNet dataset). Code and pretrained are available on https://github.com/IMPLabUniPr/swin2-mose/tree/official_code
翻译:受限于当前光学与传感器技术及其高昂的更新成本,卫星的光谱与空间分辨率往往难以满足理想需求。因此,遥感单图像超分辨率(RS-SISR)技术获得了广泛关注。本文提出Swin2-MoSE模型,该模型是Swin2SR的增强版本。我们引入增强型混合专家网络(MoE)模块MoE-SM,用以替代所有Transformer模块中的前馈网络层。MoE-SM采用Smart-Merger设计,该新型层可合并各专家输出,并创新性地定义了一种基于逐个样本(per-example)的任务分配策略,替代了传统的逐词符(per-token)方式。此外,我们分析了位置编码间的交互机制,证明了逐通道偏置与逐头偏置具有正向协同效应。最后,我们提出联合使用归一化互相关(NCC)损失与结构相似性指数(SSIM)损失,以规避传统均方误差(MSE)损失函数的局限。实验结果表明,在2倍、3倍及4倍分辨率提升任务中(基于Sen2Venus和OLI2MSI数据集),Swin2-MoSE相比当前最优方法(SOTA)的PSNR指标提升了0.377~0.958 dB。我们通过在语义分割任务(SeasoNet数据集)上的应用验证了Swin2-MoSE的有效性。代码与预训练模型已发布于https://github.com/IMPLabUniPr/swin2-mose/tree/official_code