Deep models for Multivariate Time Series (MTS) forecasting have recently demonstrated significant success. Channel-dependent models capture complex dependencies that channel-independent models cannot capture. However, the number of channels in real-world applications outpaces the capabilities of existing channel-dependent models, and contrary to common expectations, some models underperform the channel-independent models in handling high-dimensional data, which raises questions about the performance of channel-dependent models. To address this, our study first investigates the reasons behind the suboptimal performance of these channel-dependent models on high-dimensional MTS data. Our analysis reveals that two primary issues lie in the introduced noise from unrelated series that increases the difficulty of capturing the crucial inter-channel dependencies, and challenges in training strategies due to high-dimensional data. To address these issues, we propose STHD, the Scalable Transformer for High-Dimensional Multivariate Time Series Forecasting. STHD has three components: a) Relation Matrix Sparsity that limits the noise introduced and alleviates the memory issue; b) ReIndex applied as a training strategy to enable a more flexible batch size setting and increase the diversity of training data; and c) Transformer that handles 2-D inputs and captures channel dependencies. These components jointly enable STHD to manage the high-dimensional MTS while maintaining computational feasibility. Furthermore, experimental results show STHD's considerable improvement on three high-dimensional datasets: Crime-Chicago, Wiki-People, and Traffic. The source code and dataset are publicly available https://github.com/xinzzzhou/ScalableTransformer4HighDimensionMTSF.git.
翻译:多元时间序列预测的深度模型近期已展现出显著成效。通道依赖型模型能够捕捉通道独立型模型无法捕获的复杂依赖关系。然而,实际应用中的通道数量已超出当前通道依赖型模型的处理能力,且与普遍预期相反,部分模型在处理高维数据时表现甚至不及通道独立型模型,这引发了关于通道依赖型模型性能的质疑。为探究此问题,本研究首先剖析了现有通道依赖型模型在高维多元时间序列数据上表现欠佳的原因。分析发现两个核心问题:一是无关序列引入的噪声增加了捕捉关键通道间依赖关系的难度;二是高维数据带来的训练策略挑战。针对这些问题,我们提出了面向高维多元时间序列预测的可扩展Transformer——STHD。STHD包含三个核心组件:a) 关系矩阵稀疏化,用于限制噪声引入并缓解内存压力;b) 重索引训练策略,支持更灵活的批次大小设置并增强训练数据多样性;c) 能够处理二维输入并捕捉通道依赖关系的Transformer架构。这些组件共同使STHD在保持计算可行性的同时有效处理高维多元时间序列。实验结果表明,STHD在三个高维数据集(Crime-Chicago、Wiki-People和Traffic)上均取得显著性能提升。源代码与数据集已公开于https://github.com/xinzzzhou/ScalableTransformer4HighDimensionMTSF.git。