Missing data often significantly hamper standard time series analysis, yet in practice they are frequently encountered. In this paper, we introduce temporal Wasserstein imputation, a novel method for imputing missing data in time series. Unlike existing techniques, our approach is fully nonparametric, circumventing the need for model specification prior to imputation, making it suitable for potential nonlinear dynamics. Its principled algorithmic implementation can seamlessly handle univariate or multivariate time series with any missing pattern. In addition, the plausible range and side information of the missing entries (such as box constraints) can easily be incorporated. As a key advantage, our method mitigates the distributional bias typical of many existing approaches, ensuring more reliable downstream statistical analysis using the imputed series. Leveraging the benign landscape of the optimization formulation, we establish the convergence of an alternating minimization algorithm to critical points. Furthermore, we provide conditions under which the marginal distributions of the underlying time series can be identified. Our numerical experiments, including extensive simulations covering linear and nonlinear time series models and an application to a real-world groundwater dataset laden with missing data, corroborate the practical usefulness of the proposed method.
翻译:缺失数据常常严重阻碍标准时间序列分析,但在实践中却频繁出现。本文提出时间Wasserstein插补法,这是一种用于时间序列缺失数据插补的新方法。与现有技术不同,我们的方法完全非参数化,避免了插补前需要指定模型的限制,使其适用于潜在的非线性动态系统。其原理性算法实现能够无缝处理具有任意缺失模式的单变量或多变量时间序列。此外,缺失条目的合理范围及辅助信息(如箱型约束)可轻松纳入算法。本方法的关键优势在于缓解了现有方法中常见的分布偏差,确保使用插补序列进行下游统计分析时结果更为可靠。通过利用优化公式的良性结构,我们证明了交替最小化算法能够收敛至临界点。此外,我们给出了底层时间序列边际分布可被识别的条件。数值实验部分,包括涵盖线性和非线性时间序列模型的广泛模拟研究,以及对存在大量缺失数据的真实世界地下水数据集的应用分析,均验证了所提方法的实际有效性。