Medical time series datasets feature missing values that need data imputation methods, however, conventional machine learning models fall short due to a lack of uncertainty quantification in predictions. Among these models, the CATSI (Context-Aware Time Series Imputation) stands out for its effectiveness by incorporating a context vector into the imputation process, capturing the global dependencies of each patient. In this paper, we propose a Bayesian Context-Aware Time Series Imputation (Bayes-CATSI) framework which leverages uncertainty quantification offered by variational inference. We consider the time series derived from electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiology (EKG). Variational Inference assumes the shape of the posterior distribution and through minimization of the Kullback-Leibler(KL) divergence it finds variational densities that are closest to the true posterior distribution. Thus , we integrate the variational Bayesian deep learning layers into the CATSI model. Our results show that Bayes-CATSI not only provides uncertainty quantification but also achieves superior imputation performance compared to the CATSI model. Specifically, an instance of Bayes-CATSI outperforms CATSI by 9.57 %. We provide an open-source code implementation for applying Bayes-CATSI to other medical data imputation problems.
翻译:医学时间序列数据集常存在缺失值,需要采用数据填补方法,然而传统机器学习模型因缺乏预测不确定性量化而存在不足。在这些模型中,CATSI(上下文感知时间序列填补)通过将上下文向量融入填补过程以捕捉每位患者的全局依赖关系,展现出卓越的有效性。本文提出一种贝叶斯上下文感知时间序列填补(Bayes-CATSI)框架,该框架利用变分推断提供的不确定性量化能力。我们研究源自脑电图(EEG)、眼电图(EOG)、肌电图(EMG)和心电图(EKG)的时间序列数据。变分推断通过假设后验分布形态,并最小化Kullback-Leibler(KL)散度来寻找最接近真实后验分布的变分密度。因此,我们将变分贝叶斯深度学习层集成到CATSI模型中。实验结果表明,Bayes-CATSI不仅能提供不确定性量化,其填补性能也优于CATSI模型。具体而言,Bayes-CATSI的一个实例比CATSI模型性能提升9.57%。我们提供了开源代码实现,以便将Bayes-CATSI应用于其他医学数据填补问题。