Missingness is ubiquitous in multivariate time series and poses an obstacle to reliable downstream analysis. Although recurrent network imputation achieved the SOTA, existing models do not scale to deep architectures that can potentially alleviate issues arising in complex data. Moreover, imputation carries the risk of biased estimations of the ground truth. Yet, confidence in the imputed values is always unmeasured or computed post hoc from model output. We propose DEep Attention Recurrent Imputation (DEARI), which jointly estimates missing values and their associated uncertainty in heterogeneous multivariate time series. By jointly representing feature-wise correlations and temporal dynamics, we adopt a self attention mechanism, along with an effective residual component, to achieve a deep recurrent neural network with good imputation performance and stable convergence. We also leverage self-supervised metric learning to boost performance by optimizing sample similarity. Finally, we transform DEARI into a Bayesian neural network through a novel Bayesian marginalization strategy to produce stochastic DEARI, which outperforms its deterministic equivalent. Experiments show that DEARI surpasses the SOTA in diverse imputation tasks using real-world datasets, namely air quality control, healthcare and traffic.
翻译:缺失值在多变量时间序列中普遍存在,并对可靠的下游分析构成障碍。尽管循环网络插值已达到当前最优性能,但现有模型难以扩展到能够缓解复杂数据问题的深层架构。此外,插值存在对真实值产生有偏估计的风险,然而插值值的置信度通常未被测量或仅从模型输出后验计算得出。我们提出深度注意力循环插值(DEARI),该方法能联合估计异质多变量时间序列中的缺失值及其关联的不确定性。通过联合表征特征级相关性与时间动态,我们采用自注意力机制及有效的残差组件,构建具有良好插值性能与稳定收敛性的深度循环神经网络。同时利用自监督度量学习,通过优化样本相似度提升性能。最后,通过新颖的贝叶斯边缘化策略将DEARI转化为贝叶斯神经网络,生成其随机变体——随机DEARI,其性能超越确定性等价模型。实验表明,DEARI在空气质量监控、医疗健康及交通领域的真实数据集上的多种插值任务中均超越当前最优方法。