With the prevalence of sensor failures, imputation, the process of estimating missing values, has emerged as the cornerstone of time series data pre-processing. While numerous imputation algorithms have been developed to repair these data gaps, existing time series libraries provide limited imputation support. Furthermore, they often lack the ability to simulate realistic time series missingness patterns and fail to account for the impact of the imputed data on subsequent downstream analysis. This paper introduces ImputeGAP, a comprehensive library for time series imputation that supports a diverse range of imputation methods and modular missing data simulation, catering to datasets with varying characteristics. The library includes extensive customization options, such as automated hyperparameter tuning, benchmarking, explainability, downstream evaluation, and compatibility with popular time series frameworks.
翻译:随着传感器故障的普遍存在,插补——即估计缺失值的过程——已成为时间序列数据预处理的基石。尽管已开发出众多插补算法来修复这些数据缺口,但现有的时间序列库提供的插补支持有限。此外,它们通常无法模拟真实的时间序列缺失模式,也未能考虑插补数据对后续下游分析的影响。本文介绍ImputeGAP,一个全面的时间序列插补库,支持多种插补方法和模块化的缺失数据模拟,适用于具有不同特征的数据集。该库包含广泛的定制选项,如自动超参数调优、基准测试、可解释性、下游评估以及与流行时间序列框架的兼容性。