Predicting precipitation maps is a highly complex spatiotemporal modeling task, critical for mitigating the impacts of extreme weather events. Short-term precipitation forecasting, or nowcasting, requires models that are not only accurate but also computationally efficient for real-time applications. Current methods, such as token-based autoregressive models, often suffer from flawed inductive biases and slow inference, while diffusion models can be computationally intensive. To address these limitations, we introduce BlockGPT, a generative autoregressive transformer using batched tokenization (Block) method that predicts full two-dimensional fields (frames) at each time step. Conceived as a model-agnostic paradigm for video prediction, BlockGPT factorizes space-time by using self-attention within each frame and causal attention across frames; in this work, we instantiate it for precipitation nowcasting. We evaluate BlockGPT on two precipitation datasets, viz. KNMI (Netherlands) and SEVIR (U.S.), comparing it to state-of-the-art baselines including token-based (NowcastingGPT) and diffusion-based (DiffCast+Phydnet) models. The results show that BlockGPT achieves superior accuracy, event localization as measured by categorical metrics, and inference speeds up to 31x faster than comparable baselines.
翻译:降水图预测是一项高度复杂的时空建模任务,对于减轻极端天气事件的影响至关重要。短期降水预报(临近预报)不仅要求模型准确,还需要计算高效以满足实时应用需求。当前方法(如基于令牌的自回归模型)常存在归纳偏差缺陷且推理速度缓慢,而扩散模型则计算开销较大。为克服这些局限,我们提出了BlockGPT,一种采用批量令牌化(Block)方法的生成式自回归Transformer模型,能够在每个时间步预测完整的二维场(帧)。该模型作为视频预测的模型无关范式,通过帧内自注意力机制与帧间因果注意力机制实现时空解耦;本研究将其具体应用于降水临近预报。我们在KNMI(荷兰)和SEVIR(美国)两个降水数据集上评估BlockGPT,并与包括基于令牌的模型(NowcastingGPT)和基于扩散的模型(DiffCast+Phydnet)在内的前沿基线方法进行对比。结果表明,BlockGPT在预测精度、基于分类指标的事件定位能力方面表现优异,且推理速度最高可达同类基线模型的31倍。