Long-term time series forecasting (LTSF) aims to predict future values of a time series given the past values. The current state-of-the-art (SOTA) on this problem is attained in some cases by linear-centric models, which primarily feature a linear mapping layer. However, due to their inherent simplicity, they are not able to adapt their prediction rules to periodic changes in time series patterns. To address this challenge, we propose a Mixture-of-Experts-style augmentation for linear-centric models and propose Mixture-of-Linear-Experts (MoLE). Instead of training a single model, MoLE trains multiple linear-centric models (i.e., experts) and a router model that weighs and mixes their outputs. While the entire framework is trained end-to-end, each expert learns to specialize in a specific temporal pattern, and the router model learns to compose the experts adaptively. Experiments show that MoLE reduces forecasting error of linear-centric models, including DLinear, RLinear, and RMLP, in over 78% of the datasets and settings we evaluated. By using MoLE existing linear-centric models can achieve SOTA LTSF results in 68% of the experiments that PatchTST reports and we compare to, whereas existing single-head linear-centric models achieve SOTA results in only 25% of cases. Additionally, MoLE models achieve SOTA in all settings for the newly released Weather2K datasets.
翻译:长期时间序列预测(LTSF)旨在根据时间序列的过去值预测其未来值。当前该问题的现有最优(SOTA)方法在某些情况下由线性中心模型实现,这类模型主要包含一个线性映射层。然而,由于其固有的简单性,这些模型无法根据时间序列模式的周期性变化调整其预测规则。为解决这一挑战,我们提出了一种基于混合专家(Mixture-of-Experts)风格增强的线性中心模型,并提出了混合线性专家(MoLE)方法。MoLE并非训练单一模型,而是训练多个线性中心模型(即专家)以及一个对其输出进行加权混合的路由器模型。整个框架采用端到端训练方式,每个专家学习专注于特定的时间模式,而路由器模型则学习自适应地组合这些专家。实验表明,在我们评估的超过78%的数据集和设置中,MoLE将包括DLinear、RLinear和RMLP在内的线性中心模型的预测误差降低了。通过使用MoLE,现有线性中心模型在68%的PatchTST报告并与之对比的实验中实现了SOTA的LTSF结果,而现有单头线性中心模型仅在25%的案例中达到SOTA。此外,MoLE模型在新发布的Weather2K数据集的所有设置中均达到了SOTA。