Satellite image time series in the optical and infrared spectrum suffer from frequent data gaps due to cloud cover, cloud shadows, and temporary sensor outages. It has been a long-standing problem of remote sensing research how to best reconstruct the missing pixel values and obtain complete, cloud-free image sequences. We approach that problem from the perspective of representation learning and develop U-TILISE, an efficient neural model that is able to implicitly capture spatio-temporal patterns of the spectral intensities, and that can therefore be trained to map a cloud-masked input sequence to a cloud-free output sequence. The model consists of a convolutional spatial encoder that maps each individual frame of the input sequence to a latent encoding; an attention-based temporal encoder that captures dependencies between those per-frame encodings and lets them exchange information along the time dimension; and a convolutional spatial decoder that decodes the latent embeddings back into multi-spectral images. We experimentally evaluate the proposed model on EarthNet2021, a dataset of Sentinel-2 time series acquired all over Europe, and demonstrate its superior ability to reconstruct the missing pixels. Compared to a standard interpolation baseline, it increases the PSNR by 1.8 dB at previously seen locations and by 1.3 dB at unseen locations.
翻译:光学与红外卫星图像时间序列常因云层覆盖、云阴影及传感器临时故障而出现频繁的数据缺失。如何最优地重建缺失像元值并获得完整无云的图像序列,一直是遥感领域的长期难题。我们从表征学习的角度探讨该问题,开发了U-TILISE——一种高效的神经模型,能够隐式捕捉光谱强度的时空模式,从而通过训练将含云掩膜输入序列映射为无云输出序列。该模型包含:将输入序列中每一帧映射为潜在编码的卷积空间编码器;捕捉帧间编码依赖关系并沿时间维度实现信息交互的注意力时序编码器;以及将潜在嵌入解码为多光谱图像的卷积空间解码器。我们在覆盖欧洲的Sentinel-2时间序列数据集EarthNet2021上对提出模型进行了实验评估,展示了其在缺失像元重建中的优越性能。与标准插值基线相比,在已知位置PSNR提升1.8 dB,在未知位置提升1.3 dB。