While accuracy is a critical requirement for time series forecasting, an equally important desideratum is forecast stability across forecast creation dates (FCDs). Even highly accurate models can produce erratic revisions between FCDs, disrupting downstream decision-making. To improve forecast stability of such revisions, several state-of-the-art models including MQCNN, MQT, and SPADE employ a powerful yet underexplored neural network architectural design known as forking-sequences. This architectural design jointly encodes and decodes the entire time series across all FCDs, producing an entire multi-horizon forecast grid in a single forward pass. This approach contrasts with conventional neural forecasting methods that process FCDs independently, generating only a single multi-horizon forecast per forward pass. In this work, we formalize the forking-sequences design and motivate its broader adoption by introducing a metric for quantifying excess volatility in forecast revisions and by providing theoretical and empirical analysis. We theoretically motivate three key benefits of forking-sequences: (i) increased forecast stability through ensembling; (ii) gradient variance reduction, leading to more stable and consistent training steps; and (iii) improved computational efficiency during inference. We validate the benefits of forking-sequences compared to baseline window-sampling on the M-series benchmark, using 16 datasets from the M1, M3, M4, and Tourism competitions. We observe median accuracy improvements across datasets of 29.7%, 46.2%, 49.3%, 28.6%, 24.7%, and 6.4% for MLP, RNN, LSTM, CNN, Transformer, and StateSpace-based architectures, respectively. We then show that forecast ensembling during inference can improve median forecast stability by 10.8%, 13.2%, 13.0%, 10.9%, 10.2%, and 11.2% for these respective models trained with forking-sequences, while maintaining accuracy.
翻译:在时间序列预测中,准确性固然是关键要求,但一个同等重要的期望是预测在不同预测创建日期之间保持稳定性。即使高度准确的模型也可能在预测创建日期之间产生不稳定的修订,从而干扰下游决策。为提升此类修订的预测稳定性,包括MQCNN、MQT和SPADE在内的多种先进模型采用了一种强大但尚未被充分探索的神经网络架构设计——分叉序列。该架构设计联合编码和解码所有预测创建日期上的整个时间序列,在单次前向传播中生成完整的多步预测网格。这种方法与传统的神经预测方法形成对比,后者独立处理每个预测创建日期,每次前向传播仅生成单个多步预测。在本工作中,我们形式化了分叉序列设计,并通过引入量化预测修订中过度波动的度量指标,以及提供理论和实证分析,论证其更广泛应用的合理性。我们从理论上论证了分叉序列的三个关键优势:(i) 通过集成学习提升预测稳定性;(ii) 降低梯度方差,实现更稳定一致的训练步骤;(iii) 提升推理阶段的计算效率。我们在M系列基准测试中验证了分叉序列相较于基线窗口采样方法的优势,使用了来自M1、M3、M4和Tourism竞赛的16个数据集。我们观察到,基于MLP、RNN、LSTM、CNN、Transformer和状态空间架构的模型在数据集上的中位准确率分别提升了29.7%、46.2%、49.3%、28.6%、24.7%和6.4%。随后我们证明,在推理阶段进行预测集成可将使用分叉序列训练的上述模型的预测稳定性中位数分别提升10.8%、13.2%、13.0%、10.9%、10.2%和11.2%,同时保持准确性。