Accurate grid load forecasting is safety-critical: under-predictions risk supply shortfalls, while symmetric error metrics can mask this operational asymmetry. We introduce an operator-legible evaluation framework -- Under-Prediction Rate (UPR), tail Reserve$_{99.5}^{\%}$ requirements, and explicit inflation diagnostics (Bias$_{24h}$/OPR) -- to quantify one-sided reliability risk beyond MAPE. Using this framework, we evaluate state space models (Mamba variants) and strong baselines on a weather-aligned California Independent System Operator (CAISO) dataset spanning Nov 2023--Nov 2025 (84,498 hourly records across 5 regional transmission areas) under a rolling-origin walk-forward backtest. We develop and evaluate thermal-lag-aligned weather fusion strategies for these architectures. Our results demonstrate that standard accuracy metrics are insufficient proxies for operational safety: models with comparable MAPE can imply materially different tail reserve requirements (Reserve$_{99.5}^{\%}$). We show that explicit weather integration narrows error distributions, reducing the impact of temperature-driven demand spikes. Furthermore, while probabilistic calibration reduces large-error events, it can induce systematic schedule inflation. We introduce Bias/OPR-constrained objectives to enable auditable trade-offs between minimizing tail risk and preventing trivial over-forecasting.
翻译:精确的电网负荷预测具有安全关键性:预测不足可能导致供电短缺,而对称误差指标可能掩盖这种运行不对称性。我们提出一套面向操作人员的可解读评估框架——包括预测不足率(UPR)、尾部备用容量$_{99.5}^{\%}$需求以及显式偏差诊断指标(Bias$_{24h}$/OPR)——以量化超越平均绝对百分比误差(MAPE)的单侧可靠性风险。基于该框架,我们采用滚动原点前向回测方法,在覆盖2023年11月至2025年11月的天气对齐加州独立系统运营商(CAISO)数据集(包含5个区域输电区域的84,498条小时级记录)上,评估了状态空间模型(Mamba变体)与强基线模型。我们针对这些架构开发并评估了热滞后对齐的天气融合策略。研究结果表明,标准精度指标不足以替代运行安全性评估:具有相近MAPE的模型可能隐含显著不同的尾部备用容量需求(Reserve$_{99.5}^{\%}$)。我们证明显式天气融合能收窄误差分布,降低温度驱动型需求尖峰的影响。此外,虽然概率校准能减少大误差事件,但可能引发系统性调度膨胀。我们提出Bias/OPR约束优化目标,以实现尾部风险最小化与避免无效高估之间可审计的权衡。