Synthetic Aperture Radar (SAR) and optical imagery provide complementary strengths that constitute the critical foundation for transcending single-modality constraints and facilitating cross-modal collaborative processing and intelligent interpretation. However, existing benchmark datasets often suffer from limitations such as single spatial resolution, insufficient data scale, and low alignment accuracy, making them inadequate for supporting the training and generalization of multi-scale foundation models. To address these challenges, we introduce SOMA-1M (SAR-Optical Multi-resolution Alignment), a pixel-level precisely aligned dataset containing over 1.3 million pairs of georeferenced images with a specification of 512 x 512 pixels. This dataset integrates imagery from Sentinel-1, PIESAT-1, Capella Space, and Google Earth, achieving global multi-scale coverage from 0.5 m to 10 m. It encompasses 12 typical land cover categories, effectively ensuring scene diversity and complexity. To address multimodal projection deformation and massive data registration, we designed a rigorous coarse-to-fine image matching framework ensuring pixel-level alignment. Based on this dataset, we established comprehensive evaluation benchmarks for four hierarchical vision tasks, including image matching, image fusion, SAR-assisted cloud removal, and cross-modal translation, involving over 30 mainstream algorithms. Experimental results demonstrate that supervised training on SOMA-1M significantly enhances performance across all tasks. Notably, multimodal remote sensing image (MRSI) matching performance achieves current state-of-the-art (SOTA) levels. SOMA-1M serves as a foundational resource for robust multimodal algorithms and remote sensing foundation models. The dataset will be released publicly at: https://github.com/PeihaoWu/SOMA-1M.
翻译:合成孔径雷达(SAR)与光学影像提供了互补的优势,构成了超越单模态限制、促进跨模态协同处理与智能解译的关键基础。然而,现有的基准数据集通常存在空间分辨率单一、数据规模不足、配准精度低等局限,难以支撑多尺度基础模型的训练与泛化。为应对这些挑战,我们推出了SOMA-1M(SAR-光学多分辨率配准),这是一个像素级精确配准的数据集,包含超过130万对地理参考图像,规格为512 x 512像素。该数据集整合了来自Sentinel-1、PIESAT-1、Capella Space和Google Earth的影像,实现了从0.5米到10米的全球多尺度覆盖。它涵盖了12种典型的土地覆盖类别,有效确保了场景的多样性与复杂性。针对多模态投影变形和海量数据配准问题,我们设计了一个严格的由粗到精的图像匹配框架,以确保像素级对齐。基于此数据集,我们为四个层次化的视觉任务建立了全面的评估基准,包括图像匹配、图像融合、SAR辅助去云和跨模态转换,涉及超过30种主流算法。实验结果表明,在SOMA-1M上进行监督训练能显著提升所有任务的性能。值得注意的是,多模态遥感影像(MRSI)匹配性能达到了当前最先进(SOTA)水平。SOMA-1M为鲁棒的多模态算法和遥感基础模型提供了基础资源。该数据集将公开发布于:https://github.com/PeihaoWu/SOMA-1M。