We consider the problem of uncertainty quantification for prediction in a time series: if we use past data to forecast the next time point, can we provide valid prediction intervals around our forecasts? To avoid placing distributional assumptions on the data, in recent years the conformal prediction method has been a popular approach for predictive inference, since it provides distribution-free coverage for any iid or exchangeable data distribution. However, in the time series setting, the strong empirical performance of conformal prediction methods is not well understood, since even short-range temporal dependence is a strong violation of the exchangeability assumption. Using predictors with "memory" -- i.e., predictors that utilize past observations, such as autoregressive models -- further exacerbates this problem. In this work, we examine the theoretical properties of split conformal prediction in the time series setting, including the case where predictors may have memory. Our results bound the loss of coverage of these methods in terms of a new "switch coefficient", measuring the extent to which temporal dependence within the time series creates violations of exchangeability. Our characterization of the coverage probability is sharp over the class of stationary, $β$-mixing processes. Along the way, we introduce tools that may prove useful in analyzing other predictive inference methods for dependent data.
翻译:我们考虑时间序列预测中的不确定性量化问题:若利用历史数据预测下一时间点,能否围绕预测值构建有效的预测区间?为避免对数据施加分布假设,近年来保形预测方法已成为预测推断的主流方法,因其为任何独立同分布或可交换数据分布提供无分布覆盖保证。然而在时间序列场景中,保形预测方法的优异实证表现尚未得到充分理解——即便是短期时间依赖也严重违背可交换性假设。当预测器具备"记忆"能力时(即利用历史观测值的预测器,如自回归模型),该问题会进一步加剧。本研究系统考察了时间序列场景下分割保形预测的理论性质,涵盖预测器可能具备记忆能力的情形。我们通过新提出的"切换系数"来量化这些方法的覆盖损失,该系数衡量时间序列内部的时间依赖对可交换性假设的破坏程度。在平稳$β$混合过程类别上,我们对覆盖概率的刻画是尖锐的。研究过程中建立的分析工具,可能为其他依赖数据的预测推断方法提供理论分析基础。