The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.
翻译:缺失值插补是许多实际数据分析流程中面临的重大挑战。本文聚焦于时间序列数据,提出了一种名为SSSD的插补模型,该模型融合了两项新兴技术:作为最先进生成模型的(条件)扩散模型,以及作为内部模型架构的结构化状态空间模型,后者尤其擅长捕捉时间序列数据中的长程依赖关系。我们证明,在广泛的数据集和不同的缺失场景中(包括先前方法无法提供有意义结果的具有挑战性的黑缺失场景),SSSD能够达到甚至超越最先进的概率性插补与预测性能。