像素级多模态对比学习在遥感图像中的应用 (Pixel-Wise Multimodal Contrastive Learning for Remote Sensing Images)

Satellites continuously generate massive volumes of data, particularly for Earth observation, including satellite image time series (SITS). However, most deep learning models are designed to process either entire images or complete time series sequences to extract meaningful features for downstream tasks. In this study, we propose a novel multimodal approach that leverages pixel-wise two-dimensional (2D) representations to encode visual property variations from SITS more effectively. Specifically, we generate recurrence plots from pixel-based vegetation index time series (NDVI, EVI, and SAVI) as an alternative to using raw pixel values, creating more informative representations. Additionally, we introduce PIxel-wise Multimodal Contrastive (PIMC), a new multimodal self-supervision approach that produces effective encoders based on two-dimensional pixel time series representations and remote sensing imagery (RSI). To validate our approach, we assess its performance on three downstream tasks: pixel-level forecasting and classification using the PASTIS dataset, and land cover classification on the EuroSAT dataset. Moreover, we compare our results to state-of-the-art (SOTA) methods on all downstream tasks. Our experimental results show that the use of 2D representations significantly enhances feature extraction from SITS, while contrastive learning improves the quality of representations for both pixel time series and RSI. These findings suggest that our multimodal method outperforms existing models in various Earth observation tasks, establishing it as a robust self-supervision framework for processing both SITS and RSI. Code avaliable on

翻译：卫星持续产生海量数据，尤其在地球观测领域，包括卫星图像时间序列（SITS）。然而，大多数深度学习模型旨在处理完整图像或完整时间序列以提取下游任务的有效特征。本研究提出一种新颖的多模态方法，利用像素级二维（2D）表示更有效地编码SITS中的视觉属性变化。具体而言，我们基于像素级植被指数时间序列（NDVI、EVI和SAVI）生成递归图作为原始像素值的替代方案，从而创建信息更丰富的表示。此外，我们提出PIxel-wise Multimodal Contrastive（PIMC）——一种新的多模态自监督方法，该方法基于二维像素时间序列表示和遥感图像（RSI）生成高效编码器。为验证该方法，我们在三个下游任务中评估其性能：使用PASTIS数据集进行像素级预测与分类，以及在EuroSAT数据集上进行土地覆盖分类。同时，我们在所有下游任务中将实验结果与最先进（SOTA）方法进行比较。实验结果表明，二维表示的使用显著增强了从SITS中提取特征的能力，而对比学习则提升了像素时间序列和RSI的表示质量。这些发现表明，我们的多模态方法在各种地球观测任务中优于现有模型，确立了其作为处理SITS和RSI的鲁棒自监督框架的地位。代码发布于