Satellite image time series in the optical and infrared spectrum suffer from frequent data gaps due to cloud cover, cloud shadows, and temporary sensor outages. It has been a long-standing problem of remote sensing research how to best reconstruct the missing pixel values and obtain complete, cloud-free image sequences. We approach that problem from the perspective of representation learning and develop U-TILISE, an efficient neural model that is able to implicitly capture spatio-temporal patterns of the spectral intensities, and that can therefore be trained to map a cloud-masked input sequence to a cloud-free output sequence. The model consists of a convolutional spatial encoder that maps each individual frame of the input sequence to a latent encoding; an attention-based temporal encoder that captures dependencies between those per-frame encodings and lets them exchange information along the time dimension; and a convolutional spatial decoder that decodes the latent embeddings back into multi-spectral images. We experimentally evaluate the proposed model on EarthNet2021, a dataset of Sentinel-2 time series acquired all over Europe, and demonstrate its superior ability to reconstruct the missing pixels. Compared to a standard interpolation baseline, it increases the PSNR by 1.8 dB at previously seen locations and by 1.3 dB at unseen locations.
翻译:摘要:光学和红外谱段的卫星图像时间序列常因云层覆盖、云阴影及传感器临时故障而出现数据缺失。如何最优重建缺失像素值以获得完整无云的图像序列,一直是遥感研究的长期难题。我们从表征学习的角度处理此问题,开发了U-TILISE——一种高效的神经模型,能够隐式捕获光谱强度的时空模式,从而可经训练将带云掩膜的输入序列映射为无云输出序列。该模型由三部分组成:卷积空间编码器,将输入序列的每一帧映射为潜在编码;基于注意力机制的时序编码器,捕捉各帧编码间的依赖关系并沿时间维度进行信息交换;卷积空间解码器,将潜在嵌入解码为多光谱图像。我们在EarthNet2021数据集(覆盖全欧洲的Sentinel-2时间序列数据集)上对提出模型进行了实验评估,证明了其在缺失像素重建中的优越性能。与标准插值基线相比,该模型在已知位置将PSNR提升1.8 dB,在未知位置提升1.3 dB。