SatFusion：一个通过多帧与多源图像融合增强遥感图像的统一框架 (SatFusion: A Unified Framework for Enhancing Remote Sensing Image via Multi-Frame and Multi-Source Image Fusion)

Remote sensing (RS) imaging is constrained by hardware cost and physical limitations, making high-quality image acquisition challenging and motivating image fusion for quality enhancement. Multi-frame super-resolution (MFSR) and Pansharpening exploit complementary information from multiple frames and multiple sources, respectively, but are usually studied in isolation: MFSR lacks high-resolution structural priors for fine-grained texture recovery, while Pansharpening depends on upsampled multispectral images and is sensitive to noise and misalignment. With the rapid development of the Satellite Internet of Things (Sat-IoT), effectively leveraging large numbers of low-quality yet information-complementary images has become increasingly important. To this end, we propose SatFusion, a unified framework for enhancing RS images via joint multi-frame and multi-source fusion. SatFusion employs a Multi-Frame Image Fusion (MFIF) module to extract high-resolution semantic features from multiple low-resolution multispectral frames, and integrates fine-grained structural information from a high-resolution panchromatic image through a Multi-Source Image Fusion (MSIF) module, enabling robust feature integration with implicit pixel-level alignment. To further mitigate the lack of structural priors in multi-frame fusion, we introduce SatFusion*, which incorporates a panchromatic-guided mechanism into the multi-frame fusion stage. By combining structure-aware feature embedding with transformer-based adaptive aggregation, SatFusion* enables spatially adaptive selection of multi-frame features and strengthens the coupling between multi-frame and multi-source representations. Extensive experiments on the WorldStrat, WV3, QB, and GF2 datasets demonstrate that our methods consistently outperform existing approaches in terms of reconstruction quality, robustness, and generalizability.

翻译：遥感成像受硬件成本和物理限制的约束，使得高质量图像获取具有挑战性，从而推动了旨在提升质量的图像融合技术。多帧超分辨率与全色锐化分别利用了来自多帧图像和多源图像的互补信息，但通常被孤立研究：多帧超分辨率缺乏用于细粒度纹理恢复的高分辨率结构先验，而全色锐化依赖于上采样的多光谱图像，并对噪声和配准误差敏感。随着卫星物联网的快速发展，有效利用大量质量较低但信息互补的图像变得日益重要。为此，我们提出了SatFusion，一个通过联合多帧与多源融合来增强遥感图像的统一框架。SatFusion采用一个多帧图像融合模块从多个低分辨率多光谱帧中提取高分辨率语义特征，并通过一个多源图像融合模块整合来自高分辨率全色图像的细粒度结构信息，实现了具有隐式像素级对齐的鲁棒特征融合。为了进一步缓解多帧融合中结构先验的缺乏，我们引入了SatFusion*，它将一个全色图像引导的机制融入到多帧融合阶段。通过将结构感知的特征嵌入与基于Transformer的自适应聚合相结合，SatFusion*实现了对多帧特征的空间自适应选择，并加强了多帧与多源表示之间的耦合。在WorldStrat、WV3、QB和GF2数据集上进行的大量实验表明，我们的方法在重建质量、鲁棒性和泛化能力方面始终优于现有方法。