Forecasting time series with extreme events is critical yet challenging due to their high variance, irregular dynamics, and sparse but high-impact nature. While existing methods excel in modeling dominant regular patterns, their performance degrades significantly during extreme events, constituting the primary source of forecasting errors in real-world applications. Although some approaches incorporate auxiliary signals to improve performance, they still fail to capture extreme events' complex temporal dynamics. To address these limitations, we propose M$^2$FMoE, an extreme-adaptive forecasting model that learns both regular and extreme patterns through multi-resolution and multi-view frequency modeling. It comprises three modules: (1) a multi-view frequency mixture-of-experts module assigns experts to distinct spectral bands in Fourier and Wavelet domains, with cross-view shared band splitter aligning frequency partitions and enabling inter-expert collaboration to capture both dominant and rare fluctuations; (2) a multi-resolution adaptive fusion module that hierarchically aggregates frequency features from coarse to fine resolutions, enhancing sensitivity to both short-term variations and sudden changes; (3) a temporal gating integration module that dynamically balances long-term trends and short-term frequency-aware features, improving adaptability to both regular and extreme temporal patterns. Experiments on real-world hydrological datasets with extreme patterns demonstrate that M$^2$FMoE outperforms state-of-the-art baselines without requiring extreme-event labels.
翻译:预测具有极端事件的时间序列至关重要,但由于其高方差、不规则动态以及稀疏但高影响的性质,这一任务极具挑战性。现有方法虽然在建模主导的规则模式方面表现出色,但在极端事件期间其性能显著下降,这构成了现实世界应用中预测误差的主要来源。尽管一些方法通过引入辅助信号来提升性能,但它们仍然无法捕捉极端事件复杂的时序动态。为解决这些局限性,我们提出了M$^2$FMoE,一种极端自适应预测模型,它通过多分辨率和多视图频率建模来学习规则模式和极端模式。该模型包含三个模块:(1) 一个多视图频率混合专家模块,将专家分配到傅里叶域和小波域中不同的频谱带,并通过跨视图共享的频带分割器对齐频率分区,实现专家间协作以捕捉主导波动和罕见波动;(2) 一个多分辨率自适应融合模块,从粗到细的分辨率层次化聚合频率特征,增强对短期变化和突发变化的敏感性;(3) 一个时序门控集成模块,动态平衡长期趋势和短期频率感知特征,提升对规则和极端时序模式的适应性。在具有极端模式的真实世界水文数据集上的实验表明,M$^2$FMoE在不依赖极端事件标签的情况下,性能优于最先进的基线模型。