Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing values through "fully conditional specification". We extend MICE using the Bayesian framework (Bayes-MICE), utilising Bayesian inference to impute missing values via Markov Chain Monte Carlo (MCMC) sampling to account for uncertainty in MICE model parameters and imputed values. We also include temporally informed initialisation and time-lagged features in the model to respect the sequential nature of time-series data. We evaluate the Bayes-MICE method using two real-world datasets (AirQuality and PhysioNet), and using both the Random Walk Metropolis (RWM) and the Metropolis-Adjusted Langevin Algorithm (MALA) samplers. Our results demonstrate that Bayes-MICE reduces imputation errors relative to the baseline methods over all variables and accounts for uncertainty in the imputation process, thereby providing a more accurate measure of imputation error. We also found that MALA converges faster than RWM, achieving comparable accuracy while providing more consistent posterior exploration. Overall, these findings suggest that the Bayes-MICE framework represents a practical and efficient approach to time-series imputation, balancing increased accuracy with meaningful quantification of uncertainty in various environmental and clinical settings.
翻译:时间序列分析常受缺失数据影响,这是包括医疗保健和环境监测等多个领域的常见问题。链式方程多重插补(MICE)通过"完全条件规范"在缺失值插补领域占据重要地位。我们利用贝叶斯框架扩展MICE(Bayes-MICE),通过马尔可夫链蒙特卡洛(MCMC)采样进行贝叶斯推断以插补缺失值,从而量化MICE模型参数与插补值的不确定性。我们在模型中引入时间感知初始化和时滞特征,以尊重时间序列数据的时序特性。我们使用两个真实世界数据集(AirQuality和PhysioNet),分别采用随机游走梅特罗波利斯算法(RWM)和梅特罗波利斯调整朗之万算法(MALA)采样器评估Bayes-MICE方法。结果表明,相较于基线方法,Bayes-MICE在所有变量上减少了插补误差,并能量化插补过程的不确定性,从而提供更准确的插补误差度量。我们还发现MALA的收敛速度快于RWM,在实现相当精度的同时提供更一致的后验探索。总体而言,这些发现表明Bayes-MICE框架代表了时间序列插补的一种实用高效方法,在各类环境和临床场景中平衡了精度提升与不确定性的有效量化。