Transformer-based models have shown strong performance in time-series forecasting by leveraging self-attention to model long-range temporal dependencies. However, their effectiveness depends critically on the quality and structure of input representations derived from raw multivariate time-series data, particularly as sequence length and data scale increase. This paper proposes a two-stage forecasting framework that explicitly separates local temporal representation learning from global dependency modelling. In the proposed approach, a convolutional neural network operates on fixed-length temporal patches to extract short-range temporal dynamics and non-linear feature interactions, producing compact patch-level token embeddings. Token-level self-attention is applied during representation learning to refine these embeddings, after which a Transformer encoder models inter-patch temporal dependencies to generate forecasts. The method is evaluated on a synthetic multivariate time-series dataset with controlled static and dynamic factors, using an extended sequence length and a larger number of samples. Experimental results demonstrate that the proposed framework consistently outperforms a convolutional baseline under increased temporal context and remains competitive with a strong patch-based Transformer model. These findings indicate that structured patch-level tokenization provides a scalable and effective representation for multivariate time-series forecasting, particularly when longer input sequences are considered.
翻译:基于Transformer的模型通过利用自注意力机制建模长程时间依赖关系,在时间序列预测任务中展现出强大性能。然而,其有效性在很大程度上取决于从原始多元时间序列数据中提取的输入表示的质量与结构,尤其是在序列长度和数据规模增加的情况下。本文提出了一种两阶段预测框架,明确将局部时间表示学习与全局依赖关系建模相分离。在该方法中,卷积神经网络在固定长度的时间片段上进行操作,以提取短程时间动态特征和非线性特征交互,从而生成紧凑的片段级标记嵌入。在表示学习阶段应用标记级自注意力机制以优化这些嵌入表示,随后通过Transformer编码器建模片段间的时间依赖关系以生成预测。该方法在一个具有受控静态与动态因子的合成多元时间序列数据集上进行评估,使用了更长的序列长度和更大的样本数量。实验结果表明,所提出的框架在增加时间上下文的情况下持续优于卷积基线模型,并与基于片段的强Transformer模型保持竞争力。这些发现表明,结构化的片段级标记化为多元时间序列预测提供了可扩展且有效的表示方法,尤其是在考虑更长输入序列时。