Time Series Foundation Models (TSFMs) leverage extensive pretraining to accurately predict unseen time series during inference, without the need for task-specific fine-tuning. Through large-scale evaluations on standard benchmarks, we find that leading transformer-based TSFMs exhibit redundant components in their intermediate layers. We introduce a set of tools for mechanistic interpretability of TSFMs, including ablations of specific components and direct logit attribution on the residual stream. Our findings are consistent across several leading TSFMs with diverse architectures, and across a diverse set of real-world and synthetic time-series datasets. We discover that all models in our study are robust to ablations of entire layers. Furthermore, we develop a theoretical framework framing transformers as kernel regressors, motivating a purely intrinsic strategy for ablating heads based on the stable rank of the per-head projection matrices. Using this approach, we uncover the specific heads responsible for degenerate phenomena widely observed in TSFMs, such as parroting of motifs from the context and seasonality bias. Our study sheds light on the universal properties of this emerging class of architectures for continuous-time sequence modeling.
翻译:时间序列基础模型(TSFMs)通过大规模预训练,在无需任务特定微调的情况下,能够准确预测未见时间序列。通过对标准基准的大规模评估,我们发现基于Transformer的主流TSFMs在其中间层存在冗余组件。我们引入了一套用于TSFMs机制可解释性的工具,包括特定组件的消融实验以及对残差流的直接对数归属分析。我们的发现在多种不同架构的主流TSFMs中,以及多样化的真实世界和合成时间序列数据集上均保持一致。研究发现,所有模型在完整层消融后均保持稳健性。此外,我们构建了一个将Transformer框架化为核回归器的理论模型,提出了一种基于每头投影矩阵稳定秩的纯内在注意力头消融策略。通过该方法,我们识别出导致TSFMs中广泛存在的退化现象(如上下文模式复现和季节性偏差)的具体注意力头。本研究为这类新兴连续时间序列建模架构的普适特性提供了新的见解。