Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pre-training on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pre-training of TSFMs, we propose \textsc{CauKer}, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. \textsc{CauKer} combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pre-training of state-of-the-art classification TSFMs having different architectures and following different pre-training approaches. Additionally, our experiments reveal that \textsc{CauKer}-generated datasets exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), unlike real-world datasets, which display irregular scaling behavior. The source code is publicly available at https://github.com/ShifengXIE/CauKer.
翻译:时间序列基础模型(TSFMs)因其强大的零样本能力和广泛的实际应用而近来备受关注。此类模型通常需要在精心策划的大规模真实世界序列集合上进行计算成本高昂的预训练。为了实现TSFMs的样本高效预训练,我们提出了一种新颖的算法\textsc{CauKer},该算法旨在生成具有真实趋势、季节性和非线性交互的多样化、因果一致的合成时间序列。\textsc{CauKer}将高斯过程(GP)核组合与结构因果模型(SCM)相结合,为具有不同架构并遵循不同预训练方法的最先进分类TSFMs生成用于样本高效预训练的数据。此外,我们的实验表明,与显示不规则缩放行为的真实世界数据集不同,\textsc{CauKer}生成的数据集在数据集规模(10K至10M样本)和模型容量(1M至783M参数)方面均表现出清晰的缩放定律。源代码公开于 https://github.com/ShifengXIE/CauKer。