Long-term satellite image time series (SITS) analysis in heterogeneous landscapes faces significant challenges, particularly in Mediterranean regions where complex spatial patterns, seasonal variations, and multi-decade environmental changes interact across different scales. This paper presents the Spatio-Temporal Transformer for Long Term Forecasting (STT-LTF ), an extended framework that advances beyond purely temporal analysis to integrate spatial context modeling with temporal sequence prediction. STT-LTF processes multi-scale spatial patches alongside temporal sequences (up to 20 years) through a unified transformer architecture, capturing both local neighborhood relationships and regional climate influences. The framework employs comprehensive self-supervised learning with spatial masking, temporal masking, and horizon sampling strategies, enabling robust model training from 40 years of unlabeled Landsat imagery. Unlike autoregressive approaches, STT-LTF directly predicts arbitrary future time points without error accumulation, incorporating spatial patch embeddings, cyclical temporal encoding, and geographic coordinates to learn complex dependencies across heterogeneous Mediterranean ecosystems. Experimental evaluation on Landsat data (1984-2024) demonstrates that STT-LTF achieves a Mean Absolute Error (MAE) of 0.0328 and R^2 of 0.8412 for next-year predictions, outperforming traditional statistical methods, CNN-based approaches, LSTM networks, and standard transformers. The framework's ability to handle irregular temporal sampling and variable prediction horizons makes it particularly suitable for analysis of heterogeneous landscapes experiencing rapid ecological transitions.
翻译:在异质景观中进行长期卫星图像时间序列分析面临重大挑战,尤其是在地中海区域,复杂的空间格局、季节性变化以及跨越多年代的环境变化在不同尺度上相互作用。本文提出了用于长期预测的时空Transformer框架,该扩展框架超越了纯时间分析,将空间上下文建模与时间序列预测相结合。STT-LTF通过统一的Transformer架构处理多尺度空间图块与时间序列,捕获局部邻域关系和区域气候影响。该框架采用包含空间掩码、时间掩码和预测范围采样策略的全面自监督学习,能够利用40年未标注的Landsat影像进行鲁棒的模型训练。与自回归方法不同,STT-LTF无需误差累积即可直接预测任意未来时间点,通过融合空间图块嵌入、循环时间编码和地理坐标,学习异质地中海生态系统中复杂的依赖关系。在Landsat数据上的实验评估表明,对于次年预测,STT-LTF实现了0.0328的平均绝对误差和0.8412的R^2分数,其性能优于传统统计方法、基于CNN的方法、LSTM网络及标准Transformer模型。该框架处理不规则时间采样和可变预测范围的能力,使其特别适用于分析经历快速生态转变的异质景观。