The immense success of the Transformer architecture in Natural Language Processing has led to its adoption in Time Se ries Forecasting (TSF), where superior performance has been shown. However, a recent important paper questioned their effectiveness by demonstrating that a simple single layer linear model outperforms Transformer-based models. This was soon shown to be not as valid, by a better transformer-based model termed PatchTST. More re cently, TimeLLM demonstrated even better results by repurposing a Large Language Model (LLM) for the TSF domain. Again, a follow up paper challenged this by demonstrating that removing the LLM component or replacing it with a basic attention layer in fact yields better performance. One of the challenges in forecasting is the fact that TSF data favors the more recent past, and is sometimes subject to unpredictable events. Based upon these recent insights in TSF, we propose a strong Mixture of Experts (MoE) framework. Our method combines the state-of-the-art (SOTA) models including xLSTM, en hanced Linear, PatchTST, and minGRU, among others. This set of complimentary and diverse models for TSF are integrated in a Trans former based MoE gating network. Our proposed model outperforms all existing TSF models on standard benchmarks, surpassing even the latest approaches based on MoE frameworks.
翻译:Transformer架构在自然语言处理领域的巨大成功促使其被引入时间序列预测领域,并展现出卓越性能。然而,近期一篇重要论文通过证明简单的单层线性模型优于基于Transformer的模型,对其有效性提出质疑。这一结论很快被名为PatchTST的改进型Transformer模型证明并非完全成立。最近,TimeLLM通过将大型语言模型重新应用于TSF领域取得了更优结果。但后续研究再次对此提出挑战,表明移除LLM组件或将其替换为基本注意力层反而能获得更好性能。预测领域的挑战之一在于TSF数据往往更依赖近期历史,且有时会受到不可预测事件的影响。基于TSF领域这些最新认知,我们提出一种强大的混合专家框架。该方法整合了包括xLSTM、增强线性模型、PatchTST和minGRU在内的多种先进模型,通过基于Transformer的MoE门控网络集成这套互补且多样化的TSF模型。我们提出的模型在标准基准测试中超越了所有现有TSF方法,甚至优于最新基于MoE框架的研究成果。