Time Series Analysis is widely used in various real-world applications such as weather forecasting, financial fraud detection, imputation for missing data in IoT systems, and classification for action recognization. Mixture-of-Experts (MoE), as a powerful architecture, though demonstrating effectiveness in NLP, still falls short in adapting to versatile tasks in time series analytics due to its task-agnostic router and the lack of capability in modeling channel correlations. In this study, we propose a novel, general MoE-based time series framework called PatchMoE to support the intricate ``knowledge'' utilization for distinct tasks, thus task-aware. Based on the observation that hierarchical representations often vary across tasks, e.g., forecasting vs. classification, we propose a Recurrent Noisy Gating to utilize the hierarchical information in routing, thus obtaining task-sepcific capability. And the routing strategy is operated on time series tokens in both temporal and channel dimensions, and encouraged by a meticulously designed Temporal \& Channel Load Balancing Loss to model the intricate temporal and channel correlations. Comprehensive experiments on five downstream tasks demonstrate the state-of-the-art performance of PatchMoE.
翻译:时间序列分析广泛应用于各类现实应用中,如天气预报、金融欺诈检测、物联网系统中的缺失数据插补以及动作识别分类等。混合专家模型作为一种强大的架构,尽管在自然语言处理领域已展现出显著效果,但由于其任务无关的路由机制以及对通道相关性建模能力的不足,在适应时间序列分析中的多样化任务时仍存在局限。本研究提出了一种新颖、通用的基于MoE的时间序列框架,称为PatchMoE,旨在支持针对不同任务的复杂“知识”利用,从而实现任务感知。基于层级表征常因任务(如预测与分类)不同而变化的观察,我们提出了一种循环噪声门控机制,以利用路由中的层级信息,从而获得任务特定的能力。该路由策略在时间序列令牌的时间维度和通道维度上同时操作,并通过精心设计的时间与通道负载均衡损失函数来建模复杂的时间与通道相关性,从而得到优化。在五项下游任务上的综合实验表明,PatchMoE实现了最先进的性能。