It is challenging to scale time series forecasting models such that they forecast accurately for multiple distinct domains and datasets, all with potentially different underlying collection procedures (e.g., sample resolution), patterns (e.g., periodicity), and prediction requirements (e.g., reconstruction vs. forecasting). We call this general task universal forecasting. Existing methods usually assume that input data is regularly sampled, and they forecast to pre-determined horizons, resulting in failure to generalise outside of the scope of their training. We propose the DAM - a neural model that takes randomly sampled histories and outputs an adjustable basis composition as a continuous function of time for forecasting to non-fixed horizons. It involves three key components: (1) a flexible approach for using randomly sampled histories from a long-tail distribution, that enables an efficient global perspective of the underlying temporal dynamics while retaining focus on the recent history; (2) a transformer backbone that is trained on these actively sampled histories to produce, as representational output, (3) the basis coefficients of a continuous function of time. We show that a single univariate DAM, trained on 25 time series datasets, either outperformed or closely matched existing SoTA models at multivariate long-term forecasting across 18 datasets, including 8 held-out for zero-shot transfer, even though these models were trained to specialise for each dataset-horizon combination. This single DAM excels at zero-shot transfer and very-long-term forecasting, performs well at imputation, is interpretable via basis function composition and attention, can be tuned for different inference-cost requirements, is robust to missing and irregularly sampled data {by design}.
翻译:构建一个能够准确预测多个不同领域和数据集的时间序列预测模型具有挑战性,这些领域和数据集可能具有不同的底层收集过程(例如,采样分辨率)、模式(例如,周期性)和预测需求(例如,重构与预测)。我们将此通用任务称为通用预测。现有方法通常假设输入数据是规则采样的,并且它们预测到预定的时间范围,导致无法泛化到其训练范围之外。我们提出了DAM——一种神经模型,它接受随机采样的历史数据,并输出一个可调整的基函数组合,作为时间的连续函数,用于预测非固定的时间范围。它包含三个关键组件:(1) 一种灵活的方法,用于利用从长尾分布中随机采样的历史数据,这能够在保持对近期历史关注的同时,高效地获取底层时间动态的全局视角;(2) 一个Transformer主干网络,在这些主动采样的历史数据上进行训练,以产生作为表征输出的(3) 时间连续函数的基系数。我们证明,一个单变量DAM模型,在25个时间序列数据集上训练后,在18个数据集(包括8个用于零样本迁移的保留数据集)的多变量长期预测任务中,要么优于、要么与现有的最先进模型表现相当,尽管这些模型是专门为每个数据集-时间范围组合进行训练的。这个单一的DAM模型在零样本迁移和超长期预测方面表现出色,在插补任务上表现良好,通过基函数组合和注意力机制具有可解释性,可以根据不同的推理成本需求进行调整,并且通过设计对缺失和不规则采样数据具有鲁棒性。