Medical time series datasets feature missing values that need data imputation methods, however, conventional machine learning models fall short due to a lack of uncertainty quantification in predictions. Among these models, the CATSI (Context-Aware Time Series Imputation) stands out for its effectiveness by incorporating a context vector into the imputation process, capturing the global dependencies of each patient. In this paper, we propose a Bayesian Context-Aware Time Series Imputation (Bayes-CATSI) framework which leverages uncertainty quantification offered by variational inference. We consider the time series derived from electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiology (EKG). Variational Inference assumes the shape of the posterior distribution and through minimization of the Kullback-Leibler(KL) divergence it finds variational densities that are closest to the true posterior distribution. Thus , we integrate the variational Bayesian deep learning layers into the CATSI model. Our results show that Bayes-CATSI not only provides uncertainty quantification but also achieves superior imputation performance compared to the CATSI model. Specifically, an instance of Bayes-CATSI outperforms CATSI by 9.57 %. We provide an open-source code implementation for applying Bayes-CATSI to other medical data imputation problems.
翻译:医学时间序列数据集常存在缺失值,需要采用数据插补方法进行处理。然而,传统机器学习模型由于缺乏预测中的不确定性量化,往往表现不足。在这些模型中,CATSI(上下文感知时间序列插补)通过将上下文向量融入插补过程、捕捉每位患者的全局依赖性,因其高效性而脱颖而出。本文提出了一种贝叶斯上下文感知时间序列插补(Bayes-CATSI)框架,该框架利用变分推断提供的不确定性量化能力。我们考虑源自脑电图(EEG)、眼电图(EOG)、肌电图(EMG)和心电图(EKG)的时间序列。变分推断通过假设后验分布的形式,并最小化Kullback-Leibler(KL)散度,寻找最接近真实后验分布的变分密度。因此,我们将变分贝叶斯深度学习层集成到CATSI模型中。实验结果表明,Bayes-CATSI不仅能够提供不确定性量化,而且相较于CATSI模型实现了更优的插补性能。具体而言,Bayes-CATSI的一个实例性能比CATSI高出9.57%。我们提供了开源代码实现,以便将Bayes-CATSI应用于其他医学数据插补问题。