Temporal Saliency Detection Towards Explainable Transformer-based Timeseries Forecasting

from arxiv, Published at the International Workshop on Explainable and Interpretable Machine Learning (XI-ML), 26th European Conference on Artificial Intelligence (ECAI 2023)

Despite the notable advancements in numerous Transformer-based models, the task of long multi-horizon time series forecasting remains a persistent challenge, especially towards explainability. Focusing on commonly used saliency maps in explaining DNN in general, our quest is to build attention-based architecture that can automatically encode saliency-related temporal patterns by establishing connections with appropriate attention heads. Hence, this paper introduces Temporal Saliency Detection (TSD), an effective approach that builds upon the attention mechanism and applies it to multi-horizon time series prediction. While our proposed architecture adheres to the general encoder-decoder structure, it undergoes a significant renovation in the encoder component, wherein we incorporate a series of information contracting and expanding blocks inspired by the U-Net style architecture. The TSD approach facilitates the multiresolution analysis of saliency patterns by condensing multi-heads, thereby progressively enhancing the forecasting of complex time series data. Empirical evaluations illustrate the superiority of our proposed approach compared to other models across multiple standard benchmark datasets in diverse far-horizon forecasting settings. The initial TSD achieves substantial relative improvements of 31% and 46% over several models in the context of multivariate and univariate prediction. We believe the comprehensive investigations presented in this study will offer valuable insights and benefits to future research endeavors.

翻译：尽管基于Transformer的模型取得了显著进展，但长周期多步时间序列预测仍是一个持续挑战，尤其是在可解释性方面。聚焦于通用深度神经网络解释中常用的显著性图，我们的目标是构建一种基于注意力架构的模型，通过关联适当的注意力头自动编码与显著性相关的时域模式。为此，本文提出时域显著性检测（Temporal Saliency Detection, TSD）方法，这是一种基于注意力机制的有效方案，并将其应用于多步时间序列预测。虽然所提架构遵循通用的编码器-解码器结构，但我们在编码器组件中进行了重大革新——借鉴U-Net风格架构，引入了一系列信息压缩与扩展模块。TSD方法通过压缩多头注意力机制促进显著性模式的多分辨率分析，从而逐步提升复杂时间序列数据的预测能力。实验评估表明，在多种远视距预测场景下，所提方法在多个标准基准数据集上均优于其他模型。在多变量和单变量预测任务中，初始TSD模型相比其他模型分别实现了31%和46%的相对性能提升。我们认为本研究呈现的系统性探索将为未来研究提供有价值的见解与助益。