Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing values through "fully conditional specification". We extend MICE using the Bayesian framework (tBayes-MICE), utilising Bayesian inference to impute missing values via Markov Chain Monte Carlo (MCMC) sampling to account for uncertainty in MICE model parameters and imputed values. We also include temporally informed initialisation and time-lagged features in the model to respect the sequential nature of time-series data. We evaluate the tBayes-MICE method using two real-world datasets (AirQuality and PhysioNet), and using both the Random Walk Metropolis (RWM) and the Metropolis-Adjusted Langevin Algorithm (MALA) samplers. Our results demonstrate that tBayes-MICE reduces imputation errors relative to the baseline methods over all variables and accounts for uncertainty in the imputation process, thereby providing a more accurate measure of imputation error. We also found that MALA mixed better than RWM across most variables, achieving comparable accuracy while providing more consistent posterior exploration. Overall, these findings suggest that the tBayes-MICE framework represents a practical and efficient approach to time-series imputation, balancing increased accuracy with meaningful quantification of uncertainty in various environmental and clinical settings.
翻译:时间序列分析常受缺失数据影响,这在不同领域(包括医疗和环境监测)中都是一个普遍存在的问题。链式方程多重插补(MICE)通过“完全条件规范”在缺失值插补领域占据重要地位。我们基于贝叶斯框架扩展了MICE(tBayes-MICE),利用贝叶斯推断通过马尔可夫链蒙特卡洛(MCMC)采样来插补缺失值,以考虑MICE模型参数和插补值中的不确定性。模型中还引入了时间信息初始化和时滞特征,以尊重时间序列数据的时序特性。我们使用两个真实世界数据集(AirQuality和PhysioNet)评估tBayes-MICE方法,并采用随机游走Metropolis(RWM)和Metropolis调整Langevin算法(MALA)两种采样器。结果表明,相对于基线方法,tBayes-MICE在所有变量上均降低了插补误差,并通过量化插补过程中的不确定性,提供了更精确的插补误差度量。我们还发现,MALA在大部分变量上比RWM混合更好,在保持可比精度的同时实现了更一致的后验探索。总体而言,这些发现表明tBayes-MICE框架是一种实用且高效的时间序列插补方法,能在各种环境和临床背景下平衡精度的提升与不确定性的有意义的量化。