Horizon Activation Mapping for Neural Networks in Time Series Forecasting

Neural networks for time series forecasting have relied on error metrics and architecture-specific interpretability approaches for model selection that don't apply across models of different families. To interpret forecasting models agnostic to the types of layers across state-of-the-art model families, we introduce Horizon Activation Mapping (HAM), a visual interpretability technique inspired by grad-CAM that uses gradient norm averages to study the horizon's subseries where grad-CAM studies attention maps over image data. We introduce causal and anti-causal modes to calculate gradient update norm averages across subseries at every timestep and lines of proportionality signifying uniform distributions of the norm averages. Optimization landscape studies with respect to changes in batch sizes, early stopping, train-val-test splits, architectural choices, univariate forecasting and dropouts are studied with respect to performances and subseries in HAM. Interestingly, batch size based differences in activities seem to indicate potential for existence of an exponential approximation across them per epoch relative to each other. Multivariate forecasting models including MLP-based CycleNet, N-Linear, N-HITS, self attention-based FEDformer, Pyraformer, SSM-based SpaceTime and diffusion-based Multi-Resolution DDPM over different horizon sizes trained over the ETTm2 dataset are used for HAM plots in this study. NHITS' neural approximation theorem and SpaceTime's exponential autoregressive activities have been attributed to trends in HAM plots over their training, validation and test sets. In general, HAM can be used for granular model selection, validation set choices and comparisons across different neural network model families.

翻译：时间序列预测的神经网络传统上依赖于误差指标和架构特定的可解释性方法进行模型选择，这些方法无法适用于不同家族的模型。为了以与最先进模型家族中各类层类型无关的方式解释预测模型，我们引入了视界激活映射（HAM），这是一种受grad-CAM启发的可视化可解释性技术，它使用梯度范数平均值来研究视界子序列，而grad-CAM则研究图像数据上的注意力图。我们引入了因果和反因果模式来计算每个时间步上跨子序列的梯度更新范数平均值，以及表示范数平均值均匀分布的比例线。针对批量大小变化、早停、训练-验证-测试划分、架构选择、单变量预测和丢弃法的优化景观研究，均结合HAM中的性能和子序列进行了分析。有趣的是，基于批量大小的活动差异似乎表明，在每个训练周期内，它们之间可能存在指数近似关系。本研究使用在ETTm2数据集上训练的不同视界大小的多变量预测模型（包括基于MLP的CycleNet、N-Linear、N-HITS，基于自注意力的FEDformer、Pyraformer，基于状态空间模型的SpaceTime以及基于扩散的多分辨率DDPM）来生成HAM图。NHITS的神经近似定理和SpaceTime的指数自回归活动已被归因于其在训练集、验证集和测试集上HAM图的趋势。总体而言，HAM可用于细粒度模型选择、验证集选择以及跨不同神经网络模型家族的比较。