Bayesian Transformer for Probabilistic Load Forecasting in Smart Grids

The reliable operation of modern power grids requires probabilistic load forecasts with well-calibrated uncertainty estimates. However, existing deep learning models produce overconfident point predictions that fail catastrophically under extreme weather distributional shifts. This study proposes a Bayesian Transformer (BT) framework that integrates three complementary uncertainty mechanisms into a PatchTST backbone: Monte Carlo Dropout for epistemic parameter uncertainty, variational feed-forward layers with log-uniform weight priors, and stochastic attention with learnable Gaussian noise perturbations on pre-softmax logits, representing, to the best of our knowledge, the first application of Bayesian attention to probabilistic load forecasting. A seven-level multi-quantile pinball-loss prediction head and post-training isotonic regression calibration produce sharp, near-nominally covered prediction intervals. Evaluation of five grid datasets (PJM, ERCOT, ENTSO-E Germany, France, and Great Britain) augmented with NOAA covariates across 24, 48, and 168-hour horizons demonstrates state-of-the-art performance. On the primary benchmark (PJM, H=24h), BT achieves a CRPS of 0.0289, improving 7.4% over Deep Ensembles and 29.9% over the deterministic LSTM, with 90.4% PICP at the 90% nominal level and the narrowest prediction intervals (4,960 MW) among all probabilistic baselines. During heat-wave and cold snap events, BT maintained 89.6% and 90.1% PICP respectively, versus 64.7% and 67.2% for the deterministic LSTM, confirming that Bayesian epistemic uncertainty naturally widens intervals for out-of-distribution inputs. Calibration remained stable across all horizons (89.8-90.4% PICP), while ablation confirmed that each component contributed a distinct value. The calibrated outputs directly support risk-based reserve sizing, stochastic unit commitment, and demand response activation.

翻译：现代电网的可靠运行需要具备校准良好的不确定性估计的概率负荷预测。然而，现有的深度学习模型会产生过度自信的点预测，在极端天气分布偏移下会灾难性地失效。本研究提出一个贝叶斯Transformer（BT）框架，将三种互补的不确定性机制集成到PatchTST主干网络中：用于认知参数不确定性的蒙特卡洛Dropout、具有对数均匀权重先验的变分前馈层，以及对预softmax对数施加可学习高斯噪声扰动的随机注意力机制。据我们所知，这是贝叶斯注意力在概率负荷预测中的首次应用。一个七级多分位数pinball损失预测头以及训练后等渗回归校准，能够产生锐利且接近名义覆盖度的预测区间。对五个电网数据集（PJM、ERCOT、ENTSO-E德国、法国和英国）的评估，这些数据集增加了NOAA协变量，涵盖24、48和168小时预测范围，结果表明了该方法的先进性能。在主要基准测试（PJM，H=24h）中，BT实现了0.0289的CRPS，比深度集成方法提高了7.4%，比确定性LSTM提高了29.9%，在90%名义水平下达到了90.4%的PICP，并且在所有概率基线中获得了最窄的预测区间（4,960 MW）。在热浪和寒潮事件期间，BT分别保持了89.6%和90.1%的PICP，而确定性LSTM分别为64.7%和67.2%，这证实了贝叶斯认知不确定性会自然地扩大对分布外输入的预测区间。校准在所有预测范围内保持稳定（89.8-90.4% PICP），而消融实验证实每个组件都贡献了独特的价值。经过校准的输出可直接支持基于风险的备用容量确定、随机机组组合以及需求响应激活。