In real-world scenarios like traffic and energy, massive time-series data with missing values and noises are widely observed, even sampled irregularly. While many imputation methods have been proposed, most of them work with a local horizon, which means models are trained by splitting the long sequence into batches of fit-sized patches. This local horizon can make models ignore global trends or periodic patterns. More importantly, almost all methods assume the observations are sampled at regular time stamps, and fail to handle complex irregular sampled time series arising from different applications. Thirdly, most existing methods are learned in an offline manner. Thus, it is not suitable for many applications with fast-arriving streaming data. To overcome these limitations, we propose \ours: Bayesian Online Multivariate Time series Imputation with functional decomposition. We treat the multivariate time series as the weighted combination of groups of low-rank temporal factors with different patterns. We apply a group of Gaussian Processes (GPs) with different kernels as functional priors to fit the factors. For computational efficiency, we further convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE), and developing a scalable algorithm for online inference. The proposed method can not only handle imputation over arbitrary time stamps, but also offer uncertainty quantification and interpretability for the downstream application. We evaluate our method on both synthetic and real-world datasets.
翻译:在交通和能源等现实场景中,广泛存在带有缺失值和噪声的大规模时间序列数据,甚至存在不规则采样。尽管已提出许多插补方法,但大多数仅能处理局部时间窗口,即将长序列分割为合适大小的片段进行模型训练。这种局部视角会使模型忽略全局趋势或周期模式。更重要的是,几乎所有方法都假设观测数据在规律时间戳上采样,无法处理不同应用中出现的复杂不规则采样时间序列。第三,现有方法大多以离线方式学习,不适于数据快速到达的流式应用。为克服这些限制,我们提出BayOTIDE:基于函数分解的贝叶斯在线多变量时间序列插补。我们将多变量时间序列视为具有不同模式的多组低秩时间因子的加权组合。采用一组具有不同核函数的高斯过程作为函数先验来拟合这些因子。为提升计算效率,我们通过构建等价随机微分方程将高斯过程转换为状态空间先验,并开发了可扩展的在线推断算法。该方法不仅能处理任意时间戳上的插补问题,还可为下游应用提供不确定性量化与可解释性。我们在合成数据集和真实数据集上评估了该方法。