The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.
翻译:缺失值插补是许多实际数据分析流程中的一个重大障碍。本文聚焦于时间序列数据,提出了SSSD模型,该模型基于两项新兴技术:作为最先进生成模型的(条件)扩散模型,以及作为内部模型架构的结构化状态空间模型,后者特别适用于捕捉时间序列数据中的长期依赖性。我们证明,SSSD在广泛的数据集和不同的缺失场景(包括具有挑战性的黑障缺失场景,其中先前方法未能提供有意义的結果)下,匹配甚至超越了最先进的概率插补和预测性能。