In real-world scenarios like traffic and energy, massive time-series data with missing values and noises are widely observed, even sampled irregularly. While many imputation methods have been proposed, most of them work with a local horizon, which means models are trained by splitting the long sequence into batches of fit-sized patches. This local horizon can make models ignore global trends or periodic patterns. More importantly, almost all methods assume the observations are sampled at regular time stamps, and fail to handle complex irregular sampled time series arising from different applications. Thirdly, most existing methods are learned in an offline manner. Thus, it is not suitable for many applications with fast-arriving streaming data. To overcome these limitations, we propose BayOTIDE: Bayesian Online Multivariate Time series Imputation with functional decomposition. We treat the multivariate time series as the weighted combination of groups of low-rank temporal factors with different patterns. We apply a group of Gaussian Processes (GPs) with different kernels as functional priors to fit the factors. For computational efficiency, we further convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE), and developing a scalable algorithm for online inference. The proposed method can not only handle imputation over arbitrary time stamps, but also offer uncertainty quantification and interpretability for the downstream application. We evaluate our method on both synthetic and real-world datasets.
翻译:在交通和能源等现实场景中,广泛存在着带有缺失值和噪声的大规模时间序列数据,甚至可能出现不规则采样。尽管已有许多插补方法被提出,但大多数方法仅在局部窗口内工作,即通过将长序列分割成尺寸合适的片段来训练模型。这种局部窗口会导致模型忽略全局趋势或周期性模式。更重要的是,几乎所有方法都假设观测值按照均匀时间戳采样,无法处理不同应用中出现的复杂不规则采样时间序列。第三,现有方法大多以离线方式学习,因此不适用于快速到达的流式数据等应用场景。为克服这些局限,我们提出BayOTIDE:基于函数分解的贝叶斯在线多元时间序列插补。我们将多元时间序列视为具有不同模式的低秩时间因子组的加权组合。应用一组具有不同核的高斯过程作为函数先验来拟合这些因子。为提高计算效率,我们进一步通过构建等价的随机微分方程将高斯过程转化为状态空间先验,并开发了一种可扩展的在线推理算法。所提方法不仅能处理任意时间戳上的插补,还能为下游应用提供不确定性量化和可解释性。我们在合成数据集和真实数据集上评估了该方法。