Missing data in time series is a pervasive problem that puts obstacles in the way of advanced analysis. A popular solution is imputation, where the fundamental challenge is to determine what values should be filled in. This paper proposes SAITS, a novel method based on the self-attention mechanism for missing value imputation in multivariate time series. Trained by a joint-optimization approach, SAITS learns missing values from a weighted combination of two diagonally-masked self-attention (DMSA) blocks. DMSA explicitly captures both the temporal dependencies and feature correlations between time steps, which improves imputation accuracy and training speed. Meanwhile, the weighted-combination design enables SAITS to dynamically assign weights to the learned representations from two DMSA blocks according to the attention map and the missingness information. Extensive experiments quantitatively and qualitatively demonstrate that SAITS outperforms the state-of-the-art methods on the time-series imputation task efficiently and reveal SAITS' potential to improve the learning performance of pattern recognition models on incomplete time-series data from the real world. The code is open source on GitHub at https://github.com/WenjieDu/SAITS.
翻译:时间序列中的缺失数据是一个普遍问题,给高级分析带来了障碍。一种流行的解决方案是插补,其核心挑战在于确定应填充哪些值。本文提出SAITS,一种基于自注意力机制的多变量时间序列缺失值插补新方法。通过联合优化训练,SAITS从两个对角线掩蔽自注意力(DMSA)块的加权组合中学习缺失值。DMSA明确捕捉时间步长之间的时间依赖性和特征相关性,从而提升插补精度和训练速度。同时,加权组合设计使SAITS能够根据注意力图和缺失信息,动态分配两个DMSA块所学表征的权重。大量实验定性和定量地证明,SAITS在时间序列插补任务上高效优于现有最优方法,并揭示了SAITS在提高现实世界不完整时间序列数据上模式识别模型学习性能的潜力。代码已在GitHub上开源,地址为https://github.com/WenjieDu/SAITS。