Transformer-based models have shown strong performance in time-series forecasting by leveraging self-attention to model long-range temporal dependencies. However, their effectiveness depends critically on the quality and structure of input representations derived from raw multivariate time-series data, particularly as sequence length and data scale increase. This paper proposes a two-stage forecasting framework that explicitly separates local temporal representation learning from global dependency modelling. In the proposed approach, a convolutional neural network operates on fixed-length temporal patches to extract short-range temporal dynamics and non-linear feature interactions, producing compact patch-level token embeddings. Token-level self-attention is applied during representation learning to refine these embeddings, after which a Transformer encoder models inter-patch temporal dependencies to generate forecasts. The method is evaluated on a synthetic multivariate time-series dataset with controlled static and dynamic factors, using an extended sequence length and a larger number of samples. Experimental results demonstrate that the proposed framework consistently outperforms a convolutional baseline under increased temporal context and remains competitive with a strong patch-based Transformer model. These findings indicate that structured patch-level tokenization provides a scalable and effective representation for multivariate time-series forecasting, particularly when longer input sequences are considered.
翻译:基于Transformer的模型通过利用自注意力机制建模长程时间依赖关系,在时间序列预测中表现出强大性能。然而,其有效性关键取决于从原始多元时间序列数据导出的输入表示的质量与结构,尤其是在序列长度和数据规模增加时。本文提出一种两阶段预测框架,将局部时间表示学习与全局依赖建模显式分离。在所提方法中,卷积神经网络在固定长度的时间补丁上运行,以提取短程时间动态特征和非线性特征交互,生成紧凑的补丁级标记嵌入。在表示学习阶段应用标记级自注意力以优化这些嵌入,随后通过Transformer编码器建模补丁间的时间依赖关系以生成预测。该方法在具有受控静态与动态因素的合成多元时间序列数据集上进行评估,采用扩展的序列长度和更大规模的样本。实验结果表明,所提框架在增强时间上下文条件下持续优于卷积基线模型,并与基于补丁的强Transformer模型保持竞争力。这些发现表明,结构化的补丁级标记化为多元时间序列预测提供了可扩展且有效的表示方法,尤其在处理更长输入序列时具有显著优势。